Joook of 
Pupil 
Evaluation 


Handboc 


PRITAM SINGH 


This Handbook of Pupil Evaluation is 
written in the Indian context but in ac- 
cordance with the modern view of evalua- 
tion as a growth oriented ecological con- 
cept, using the philosophy of objective 
based evaluation which is reflected even jn 
organisations of its cp 


SASS 
ATs 


taxonomy of educational objectives, com- 
prehensive exposition of tools and tech- 
niques of evaluation and exhaustive treat- 
ment of evaluation data makes this com- 
pendium complete in itself. Emphases in 
evaluation as reflected in National Educa- 
tion Policy and the emerging demands on 
today’s evaluators are highlighted towards 
the end and forms the basis for taking up 
more complex and technical issues of 
measurement in the forthcoming compan- 
ion volume of this publication. Topics like 
question banking, continuous comprehen- 
sive evaluation, diagnostic evaluation, cri- 
terion-referenced testing, | illuminative 
evaluation, grading and Scaling, practical 
examinations, measuring the affect, evalu- 
ation of curricula, textbooks, programmes 
and institutions, self evaluation, open book 
examinations etc. will be taken up for dis- 
cussion in that volume. 


ISBN 81-7023-203-1 Rs.180.00 


Handbook of Pupil Evaluation 


| 2 | 
| НАМРВООК ОБ 
PUPIL EVALUATION 


PRITAM SINGH 


( M.Sc., M.Ed, (Gold Medalist), Ph.D. Associate (London University) 
|| Professor and Head, Navodaya Vidyalaya Cell (NCERT) 


à Formerly 
| Head Department of Measurement and Evaluation and 
ү National Talent Search Unit NCERT, New Delhi 


ALLIED PUBLISHERS LIMITED 


. NEW DELHI BOMBAY CALCUTTA MADRAS NAGPUR 
AHMEDABAD BANGALORE HYDERABAD LUCKNOW 


ALLIED PUBLISHERS LIMITED 


Regd. Off.: 15 J.N. Heredia Marg, Ballard Estate, Bombay 400038 
Prarthna Flats (1st Floor), Navrangpura, Ahmedabad 380009 
3-5-1129 Kachiguda Cross Road, Hyderabad 500027 
60 Bajaj Nagar, Central Bazar Road, Nagpur 440010 
16A Ashok Marg, Patiala House, Lucknow 226001 
5th Main Road, Gandhinagar, Bangalore 560009 
17 Chittaranjan Avenue, Calcutta 700072 
13/14 Asaf Ali Road, New Delhi 110002 
751 Anna Salai, Madras 600002 


S .5..r. West Deng 
до |.) 
Date 22:21 . . 


о 


27|:26 
SIN 


First published 1989 
O Allied Publishers Limited, 1988 
ISBN 81-7023-203-1 


VON x PRINTED IN INDIA 
‘by - Sharma for Veronica Press, 81-U.B. Jawahar Na ar, Delhi 110007 
and published by R.N. Sachdey for Allied Publishes Visited 
— 13/14 Asaf Ali Road, New Delhi 110002 t 


То 
the memory of my beloved son 
NAVDEEP 
who did not live to see this book 


ACKNOWLEDGEMENTS 


It gives me not only great satisfaction but also the privilege to express 
my sense of gratitude to the eminent educationist Professor 
P.N. Kirpal formerly Deputy Chief Executive of UNESCO and Secre- 
tary Education, Government of India, for his painstaking efforts in 
going through this book and giving his expert opinion in the form of 
a very concise, thoughtful and expert opinion about this publica- 
tion. My grateful thanks are due to Dr. M.B. Buch, the well-known 
educationist, researcher and academician, formerly Director of 
Centre for Advance Studies in Education, Baroda, and Head, earst- 
while Department of Field Services of the NCERT, for reflecting his 
views about this publication especially highlighting its usefulness 
for teachers, teacher-educators, paper-setters, moderators and 
researchers. 

Iam thankful to Dr. (Mrs.) Kamla Menon, Lecturer, NCERT by 
whose courtesy one of her sample unit tests in Geography, Man And 
Food Supply was included in this book. For this I also acknowledge 
the NCERT for having used this material developed in the 
Department of Measurement and Evaluation. 1 would like to 
mention particularly the names of eminent measurement specialists 
like Lindquist, Thorndike, Hagen, Guilford, Ebel, TenBrink, 
Hudson and Popham from whose writings I culled out various ideas 
and quoted in this book. Lastly, I would like to offer my apologies to 
all those whose contributions and publications might have been 
taken advantage of, in one form or the other, but could not be 
acknowledged by name. My special acknowledgements are due to the 
NCERT and other educational agencies which provided me the 
needed opportunity to develop, grow and improve my academic 


acumen on the basis of which I could accomplish the task of writing 
this book. 


PRITAM SINGH 


—— m 


FOREWORD 


Examinations, at all levels of education, from the School to 
the university and professional colleges, have to be taken by 
students for admission, promotion and certification of 
achievement. For purposes of employment, and particularly 
for employment in the public services, one has to sit in 
examinations. Hence, the importance of examination 


is 

obvious. | | 
With developments in the fields of education, training, 
placement, personnel management and productivity, it has 


become increasingly clear to all those who are concerned with 
instruction, management of learning, students and learners in 
generalin various kinds of activities, that examinations are 
basically concerned with value judgements, expressed in the 
form of pass-fail, marks, grades and the like, regarding what 
has been learned or acquired by the individual concerned, 
Evaluation, instead of examination, seems to indicate better 
what goes on in such situations. The purpose of any examina- 
tion in the school or coliege, be it tutorial, periodical, 
terminal or half-yearly, is to evaluate what has been learned, 
how much has been learned and how well it has been learned. 
An evaluative judgement is expressed in the form of marks out 
of 10, 15, or any other set of marks for each question, with a 
total of 100 marks distributed more or less equally among a 
fixed number of questions to be answered, say 10 or 5 within a 
fixed period of time. 

During the last thirty years, there h 
development in evaluation of pupil learni 
concepts are in use, and for any teacher 
pupil evaluation, it is necessar 


as been considerable 
ng. A large variety of 
and practitioner of 


individuals concerned the purpose of 
other words, as a result of research, duri on. In 


vi Foreword 


we know now various factors which influence evaluation апд, 
therefore, for any worthwhile decision, we need to discriminate 
among the various types of evaluation and select the one 
which is most appropriate for the task in hand. 

For any evaluation, it is now necessary to prepare а scheme 
first showing the specific educational objectives, the types of 
items for each objective and the number of items in each type. 
This is the beginning of objectivity of an evaluation, because 
there must be a consensus among several evaluators on the 
Scheme of evaluation, including the rules of assignment of 
marks. The items are usually of the objective type, which in 
practice assumes the form of multiple-choice items with one 
correct answer, or best answer, as the task for the student to 
identify. But there are other types of items also. The important 
thing is to have а number of items and reduce the task of 
handwriting to a minimum. This is another aspect of the 
objectivity of a test or examination. Between examiners 
variations are reduced to the minimum, practically to zero. But 
in order to prepare an instrument based on a scheme like the 
one visualised here takes time and effort. 

This Handbook of Pupil Evaluation focusses on pupil 
evaluation by classroom teachers in the Indian context, using 
the philosophy of objective-based evaluation and putting the 
Same into practice in its content, organisation and exposition, 
The author Dr. Pritam Singh has the classroom teacher, the 
Pupil teacher and the in-service teacher educator in view. 
Written in a simple and direct style, this handbook is based on 
а philosophy of objective based instruction and evaluation. In 
addition, evaluation is treated as à growth-oriented, total 
chool concept and a feedback device that aims at improving 
the educational process. 

The author has emphasised evaluation as an integral 
aspect of the instructional process, with a focus on improve- 
ment of students’ learning. The Philosophy of objective-based 
evaluation is reflected in the organisation of the book. The 
major learning outcomes expected are highlighted in the 
Preface, besides stating the Intended Learning Outcomes at the 


beginning of each chapter, to provide directicn to and focu 
the attention of the readers on intended learning. 


"- 


Foreword vii 


The first three chapters are on Historical background of 
measurement, Contemporary concepts and Basic statistical 
concepts and are useful for developing an insight into the 
changing emphases in testing, for better communicability among 
evaluators, for а better conceptual understanding and for a 
better use of evaluation data in improving classroom learning. 
There is a chapter on Taxonomies of educational objectives. 
A comprehensive treatment is given to various taxonomies in 
cognitive, affective and psychomotor domains to reflect the 
different frameworks for teaching and evaluation. 

The Handbook is intended to be of practical use. The 
teachers would find a number of examples for testing different 
abilities, based on various content elements, using verbal and 
non-verbal media of testing, with the help of different forms of 
questions. In fact, quite a few innovative varieties of items are 
introduced in chapters on construction of various types of 
items. The technique of developing unit tests and question 
papers is discussed in the Indian context and changing 
emphasis in designing and developing question papers in 
various examining agencies. 

It is good to see that in this handbook, the author, 
Dr. Pritam Singh, who has a number of years of experience in 
the NCERT in relation to evaluation of learning outcomes, 
has delineated in clear terms the various aspects of evaluation 
for the teacher and the teacher educator. He bas plans, I 
understand, to produce another book on the subject concerned 
with some other aspects of evaluation, which have not been 
covered deliberately in the present volume. 

I would like to congratulate Dr. Pritam Singh for under- 
taking this work of exposition of the concepts and methods of 
evaluation for the benefit of teachers ina simple and straight- 
forward manner, highlighting the Indian context. It is hoped 
that the clientele of the handbook would find it rewarding in 
making evaluation a useful device for not only measuring 
students’ achievement, but also for improving students’ 


learning. 


New Delhi SHIB K. MITRA 
Date: 15.1.88 Formerly Director NCERT 


PREFACE 


This Handbook of Pupil Evaluation was visualised in two 
volumes, one dealing with fundamentals of measurement and 
process of evaluation while the other one which is under print, 
to incorporate the more technical and complex concepts relating 
to emerging trends in educational evaluation. Although origi- 
nally planned as one volume, compulsion of pages and the two 
types of clientele prompted me to bifurcate the content into 
two; one primarily for teachers and students of teacher training 
colleges and the other one for teacher educators and profes- 
sional readers. Accordingly, the first volume was restricted to 
concept, process and techniques of evaluation besides reflecting 
changes in emphases and the emerging trends. The focus of 
this handbook is on pupil evaluation primarily concerning cog- 
nitive domain, the other two domains being taken up in the 
second volume. 

While there is no derth of foreign books on measurement 
and evaluation the fact remains that there is hardly any book 
for Indian teacher or student teacher, which gives comprehen- 
sive view of evaluation and at the same time reflects modern 
approaches to depict evaluation as a total school concept, a 
growth oriented concept and an ecological concept in the 
context of educational process. А deliberate attempt is made to 
focus the attention of the reader on the ictended outcomes of 
each chapter so that he may be able to look for the relevance 
of the material in each chapter, to the intended learning. This 
is indeed the basic philosophy underlying evaluation process 
which envisages formulation of instructional objectives and 
taking Cognizance of teaching and testing. This makes the 
reading of each chapter more meaningful and provides the 
needed direction. For the interested reader each chapter is 
documented and the references made in the text are given 


chapter-wise at the end of the book. This I am sure would 


х Preface 


encourage further reading besides providing authenticity of the 
material quoted or referred to in the text. 

There are 19 chapters in this book starting with historical 
background of measurement and ending up with emerging 
demands on tomorrow's evaluator. The first chapter though, 
has no relevance to the educational process yet gives a back- 
drop of history of the field of measurement. It provides the 
readers an insight into various efforts made ranging from earlier 
crude attempts to the modern approaches with well developed 
theoretical constructs, thereby reflecting a trend towards preci- 
sion, comprehensiveness and psychometric approach to measure- 
ment, with a focus on ecology of evaluation. 

Due to a lot of literature coming up in this field a web of 
evaluation terminology has developed. It has caused so much 
confusion due to overlapping of terms that for ordinary teachers 
it is difficult to see relationships among these concepts and judge 
their scope and implications. It was, therefore, necessary to 
acquaint the readers with the contemporary concepts in the 
field of educational evaluation. Likewise assuming that most of 
the teachers may not be conversant with basic statistical con- 
cepts, a chapter on use of some basic concepts in statistics was 
considered a necessity to enable the readers to understand and 
appreciate the discussion made in the chapters involving statisti- 
cal concepts. 

Need for a thorough discussion of instructional objectives in 
апу book on evaluation cannot be over-emphasised. Since 
Classification of goals and their role in teaching and testing is a 
field by itself, their formulation, derivation, specification and 
Statement was necessary in order to highlight objective based 
evaluation vis-a-vis objective based instruction. This chapter 
paves the way for evaluating learning outcomes and forms the 
basis for objective cente ed instruction and objective based 
evaluation. 

Before discussing the process in the form of sequential steps 
it was necessary to dispel some misconception about the nature, 
Scope and function of evaluation in the light of modern view of 


evaluation and its place in the teaching learning process. 
Besides, differentiating the various general characteristics 


emphasis is laid on treating evaluation as a growth oriented, 


Preface xi 
total school concept concerned with quality control and diag- 
nostic functions. This is followed by a chapter on evaluative 
steps involving information gathering, judgement forming and 
decision taking. 


Chapter on Tools and Techniques forms the crucial content 
of this handbook. It discusses that four basic techniques of 
evaluation, viz.. testing, observation, inquiry and analysis , with 
reference to nature, purpose, scope and tools involved in each 
of these techniques. Test construction being the nucleus of 
educational evaluation, separate chapters are devoted to the 
discussion on genesis of a question, constructed questions (long 
answer and short answer), Multiple choice items and other 
varieties of objective type questions. These chapters are very 
well illustrated with examples testing different abilities. 

Forms of questions is followed by technique of construction 
of achievement tests. For this a separate chapter is devoted to 
the theory of measurement discussing in details the three basic 
qualities of measuring instruments, viz. validity, reliability and 
usability. This is followed by a separate chapter each on unit 
testing, and paper setting. Designing, developing blue print, 
question framing, consolidation, developing marking schemes 
and question-wise analysis are described besides moderation 
techniques. How validity, reliability and usability can be taken 
care of in such achievement tests is indicated besides providing 
rationale underlying designing of these tests. Analysis of ques- 
tions and question papers forms the next chapter that deals 
with distractor analysis, facility value and discrimination value 
of items in a norm-referenced as wellas criterion-referenced 
tests besides other statistical concepts related to measurement 
data. Simple techniques of analysis are discussed to enable the 
teachers to use analysis for improvement of questions and 
question papers. 

In order to enable the readers to become aware of the 
shanging emphases in examination reform and educational 
evaluation, trends are highlighted in a separate chapter. These 
trends relate to concept, purpose, focus, scope, approach, 
instrumentation, methodology of judgement making and deci- 
sion making aspects of evaluation. Purpose is to make them 
conscious of the need for using evaluation not only for 


xii Preface 


measurement of students’ achievement but also for improvement 
of their achievement. 

The penultimate chapter which is of informative nature 
rather than reflective. It describes the place of evaluation and 
its role as reflected in the National Education Policy and the 
corresponding Programme of Action. It was added later to 
highlight the priorities in educational evaluation. This chapter 
lists recommendations at school as well as college/university 
level and provides direction for future work in this field. 

The last chapter on Emerging Demands on Today's Evalua- 
tors is specifically included to identify the priority areas and 
emerging trends in philosophy, psychology and sociology of 
educational process which have corresponding demands on 
pupils, teachers and educational administrators for evaluating 
students, teachers and institutions. These learner-based, teacher- 
based and administrator-based demands are to be met by 
today's evaluators who have to plan, develop and organise 
evaluation process in a way that it meets requirements of the 
clientele. These demands are crystallised in this chapter in the 
form of indicators like, the need for diagnostic testing, question 
banking, criterion referenced testing, continuous comprehensive 
evaluation, scaling and grading, improvement of practical tests, 
measurement of affective outcomes, curriculum evaluation, 
institutional evaluation, needed research etc. etc. All these issues 
are discussed in the forthcoming volume which is under print. 

This handbook can best be used by our clientele if while 
reading or while teaching this subject, keep in view the follow- 
ing main themes of this book, which are indicated at best by 
means of a list of the major intended learning outcomes that 
Should result from its use: 


(1) Recall the history of educational measurement as well as 
the nature and scope of various terms and concepts used 
in educational evaluation, from time to time. 

(2) Understand the modern concept, process and steps involv- 
ed in educational evaluation. 

(3) Identify inter-relationship of teaching, learning and 
evaluation. 

(4) Formulate, classify, specify and state instructional objec- 
tives in performance terms. 


CC C CMM CC CM CC CDM чылалан 


Preface xiii 


(5) Calculate various statistics and use them in interpreting 
evaluation data. 


(6) Apply the concept of validity, reliability and usability in 
construction, selection, interpretation and use of tests and 
other tools of evaluation. 


(7) Develop the skill to construct, select and use various type 
of questions and tests. 


(8) Use non-testing techniques of evaluation in appraising. 
student's learning and growth. 


(9) Analyse and interpret tests and test results. 


(10) Appreciate the potentialities and limitations of various 
evaluation procedures used in schools. 


(11) Get insight into the role of evaluation as a quality 
control device and in improvement of the teaching learn- 
ing process. 


(12) Become aware of the modern trends, priorities апа chang- 
ing emphases in educational evaluation. 


While writing this book students of teaching and practicing 
teachers are presumed notto have had any previous special 
training in educational measurement. That is why no attempt is 
made in this volume on complex and highly technical issues 
of measurements. It is written for those who are directly 
involved in teaching, paper setting, examining, testing and 
interpretation. of test results of either external or interna] 
examinations. As such teachers, pupil teachers, teacher educ 
tors, paper setters, evaluators and examiners are the client 
for whom practical guidelines are provided besides 
relevant concepts and principles underlying, besides 
phical, rational and empirical bases : 
accepting the suggestions offered. 

There is hardly any doubt that little that is w 
handbook has not been said before. 
he has used his experience of 25 years in this беја, 
for selection of material from various sources and › AS a guide 
in a manner that it becomes meaningful to the r Presenting it 

eaders. Most 
st of 


a- 
ele 
giving the 

philoso. 


for understanding and 


ritten in А 
Author's best hope is no 


3 " 
Жїз Preface 


the illustrative materials included in this book are developed and 
used by the author in various training programmes conducted by 
him at state, national and international levels. Nevertheless, it 
was quite impossible to discuss the various topics to the desired 
extent because of the constraint imposed on number of pages. 

I take this opportunity to acknowledge my grateful thanks 
to all my colleagues from N.C.E.R.T. with whom I had the 
privilege to interact, conduct evaluation workshops and trainin g 
programmes that provided me more and more insight into the 
development of materials and the related concepts. 

For many of the new ideas and concepts discussed, I owe 
my sense of gratitude to measurement experts like those of 
B.S. Bloom, Walker Hill, Krathwhol, Arie Lewy, Klopffer, 
A.E. Harper (Л), S.K, Mitra, R.N. Mehrotra, R.H. Dave, 
P.N. Dave and many other experts from whose lectures, dis- 
Cussions and books I had my professional growth in the field of 
educational evaluation and ventured to write this book. 

I am indebted to Dr. S.K. Mitra for having agreed to go 
through the manuscript and write a foreword for the same. It 
was indeed a great task for him and a pleasant satisfaction for 


me to have a foreword from а great psychometrician and an 
eminent educationist. 


during writing of the manuscript. 

It is my fond hope that this Handbook of Pu 
Will be useful to all teachers, 
teacher educators especially fr 
guidelines Provided for imp 


ју," 


— à 


£ 


Preface ху 
control device to improve their learning. 


To know what a student knows or knows not is only 
measurement of his achievement but to know why he 
knows not and how he can know what he knows not, is 
improvement of his achievements. 


New Delhi PRITAM SINGH 
13th April 1988 Professor 
National Institute of Education 

N.C.E.R.T., New Delhi 


CONTENTS 


Foreword 
Preface 
1. Examination Reform in Retrospect 
II. Contemporary Concepts in Educational Evaluation 
ПІ. Basic Statistical Concepts in Measurement 
IV, Instructional Objectives and Evaluation 
V. Modern Evaluation А Total School Concept 
VI. Steps in Evaluative Process 
VII. Tools and Techniques of Evaluation 
VIII. Genesis of a Question 
IX. Constructed Questions (Supply Type) 
X. Objective-Type Items (Multiple Choice Type) 
XI. Objective-Type Items (Other Varieties) 


ч 


XII. Qualities of a Good Measuring Instrument 

XIII. Unit Testing—A Developmental Approach , 
XIV. Technology of Setting Better Question Papers 

XV. Moderation of a Question Paper 


XVI. Analysis for Improving Questions and Question 
Papers 


ХУП. Evaluation Trends іп 104-2 Pattern of Schooling 
ХУШ. Evaluation in National Р 


olicy on Education 
—1986 
XIX. Emergent Demands on Today’s Evaluator 
References 
Index 


к= ны 


СНАРТЕК 1 


EXAMINATION REFORM IN RETROSPECT 


1. Origin of Examinations 


Examinations have played differential roles in the strategy of 
educational programmes. In fact they have played a far more 
influential role in human history than is generally recognised. 
Examinations, which indeed occupied the position of measure- 
ment and evaluation, originated in antiquity. Since then an 
examination has been considered as a judgemental balance, a 
Sacred cow, a Socratic quiz, an infallible instrument of evalua- 
tion, a necessary evil, a yardstick for measuring the efficiency 
of teachers, a standard setter, a learner’s goal, parents’ focus 
and a societal criterion of employment. Their prominent position 
in the social Qe key role in educational set-up, strangulating 
influence on the teaching-learning process and omnibus use in 
certification, selection, learning, teaching and employment have 
made examinations a prestigious institution. Their role at 
different times can be appreciated if we trace the origin and 
history of these examinations from their earliest use to the 
present day. 

Perhaps the first evidence of oral examinations is found in a 
Story told in the Old Testament (1) of the pronunciation test 
the Gileadites gave to their enemies, the Ephraimites, who had 
escaped to cross the Jordan. In order to recognise an Ephraimite 
he was asked by the men of Gilead if he was an Ephraimite, 
To escape arrest if he said, “Nay”; then he was asked to Say, 
“Shibboleth”. If he pronounced it as *'Sibboleth", he Was 
slewn at the passage of the Jordan. This was the one-question 


2 Handbook of Pupil Evaluation 


test of the objective type which forty two thousand Ephraimites 
failed to pass and therefore, were killed. It is а different thing 
that the use of measurement was made for a different purpose 
depending upon the goal or value of measurement. If some of 
the Ephraimites pronounced correctly or the tester himself 
could not discriminate between the actual and the intended 
pronunciation, the sword of judgement might have fallen on the 
wrong man. This is a question of the relevance and depend- 
ability of the testing device that is a separate issue. 

Socrates (4692-399 B.C.) used oral quizzing by putting 
searching questions. Job interviews are nothing but oral tests. 
Up to the middle of the 19th century oral examinations were 
used in American schools as a standard. Some countries still 
require oral final examinations by law (Stanley, 1960). ‘Thesis 
oral' in American Universities is a vestigial form of this. 

Written tests are comparatively of more recent origin than 
oral testing or quizzing but even they date back to 2200 B.C. 
when the Chinese are believed to have had a national system of 
examinations to select their public servants. Confined to isolat- 
ed cells, the candidates were assigned topics on which a long 
paper or treatise was to be written. Recognition of individual 
difference, 4 B.C., led Plato to divide his ideal society into 
three classes, the workers, the protectors and the rulers. Ап 
impressionistic judgement of personality on the basis of 
physiognomy, body build or divination led to classes of persons 


Specialising in astrology, palmistry, phrenology and graphology 
(2).* 

The first educational tests we 
tests in that they did measure 
tive tribes, 


re like present performance 
physical performance. In primi- 
young men were tested for their knowledge of tribal 
customs, endurance, bravery, hunting skills etc. needed for 
protection. The ancient Spartans (3) conducted examinatioris as 
early as 500 B.C. to test physical development and stoicism 
which were stressed in their educational curricula. Evaluation 
of athletics and aesthetic development by means of games and 
contests in reading, writing and singing in ancient Athens (4) is 
another example of the use of tests, ]n medieval times; oral exa- 
minations were used in the University of Bologna by 1219 A.D. 


Examination Reform in Retrospect 3 


and the University of Paris by the end of 13th century 
when candidates were required to defend their thesis orally. 
Perhaps for the first time educational use of written examina- 
tions was made at Cambridge, England, in 1702 (5). 


2. Developments during 19th Century 


One of the earliest records of educational testing in America 
dates back to the written examination in Boston in the year 
1845 (6). Prior to that, oral examinations prevailed, which be- 
came impracticable due to increasing number of pupils and the 
time-consuming process of oral testing. Arithmetic, astronomy, 
geography, grammar, history and natural philosophy were the 
subjects of examination. This Boston examination project was 
highly commended by Horace Mann, the then Secretary of the 
Massachusetts Board of Education. He advocated (7) the 
replacement of oral by written examinations on various grounds 
like those of impartiality in Scoring, thoroughness, no favourit- 
ism, availability of information to all, and amenability to 
scrutiny of questions by all. 

Rev. George Fisher (1864) an English school master devised 
the first measurement device, the ‘scale books’ for measuring 
achievement in handwriting, spelling, mathematics, navigation, 
grammar, composition, drawing etc. Pupils’ specimen were 
compared with standard specimen and ranking was done to 
determine numerical rating. Quantitative evaluation was done 
in terms of errors in performance as in spellings (8). Despite the 
great potential of Fishers’ scale books, his ideas did not last 
long as he was ahead of his time in his educational thought. 

E.E. White (1886), another American educational writer and 
School administrator wrote, “It may be stated as a general fact 
that school instruction and study are never much wider or better 
than the tests by which they are measured." He gave several 
special advantages of written tests such as tangible and reliable 
results, more accurate comparison of pupils? progress, revealing 
more clearly the defects of teaching etc. 

Dr. J.M. Rice (1894) is considered the real inventor (9) of 
the comparative tests which есате the basis of objective 
measurement. He administered а fist of spellings to 13,600 


4 ` Handbook of Pupil Evaluation 


students from 21ʻeities, spread over’.16 months and in 1897 
presented his papèr to the Superintendence of the National 
Education Association. The. findings that pupils who studied 
spellings for 30 minutes а day for 8 years are no better than 
those who studied for 15 minutes a day, for the same period, 
surprised every body. He was attacked and defamed for this 
‘heresy’. It was only a decade later that his objective method 
got attention in educational testing. The value of what we now 
call ‘norm’ is the resultant outcome of his investigation. Rice 
was not only a tester but a pioneer of progressive edücation. 

During this century we may, therefore, witness three main 
developments. The first development is the popularity of the 
written examinations over the oral testing. The second develop- 
ment is the refinement of evaluative devices to ensure more 
objectivity in measurement. Thirdly, a breakthrough is made to: 
compare performance of bigger groups of students, a step 
towards development of norms. 


3. Developments during 20th Century 


From the rudimentary approaches of the 19th century we now 
make a survey of the more comprehensive and scientific appro- 
aches that began at about the year 1900. Scates (10) traced the 
trends in measurement and evaluation from 1897 to 1946 into 
five periods of one decade each upto 1950. The last three 


decades are also added for this exposition to complete the era. 
upto date. 


3.1. First decade (1900-1910) 


First decade witnessed the publication of the ‘Futility of the 
Spelling Grind' by Rice (in fact 1897) which shattered the 
belief that the product of learning was intangible and only à 
teacher of a particular class could appraise it. The first Binet 
Intelligence Scale and the beginning of achievement tests in. 
basic skills of Arithmetic and Language arts are the other 
special features c “this decade which may be called an incu- 
bation period. It was in this period that the first book on. 
Mental and Educational Measurement was written by Thorn- 


‘Examination Reform in Retrospect 5 


‘dike (11) in 1904. This book influenced a lot the early develop- 


ment and popularisation of standardised educational tests. In 
1908 Thorndike's student Stone (12) published the first Stan- 


‘dardised Instrument in. Arithmetic Reasoning while Thorndike 
(13) himself published in 1909 his Scale Тог the Handwriting of 
Children, which was the first Standardised’ Achievement-Scale, 


3.2. Second decade (1911-1920) 


‘During this decade some of the intelligence tests and achieve- 
ment tests were standardised. For the first time standardised 
achievement tests were used in the city of New York for a 
survey during 1911-13. Ayres published a handwriting Scale in 
1911, Hillegas a composition scale in 1912, Buckingham а spel- 
ling scale in 1913, Woody's arithmetic scale came in 1916 and 


‘Terman made famous the Stanford Revision by 1916. Tatroduc- 


‘tion of group intelligence tests in World War I was another 
landmark of this period, besides the fight for objectivity in edu- 
‘cational measurement. It was McCall (14) who was the first to 
‘express publicly in 1920 that teachers instead of using stap- 
'dardised achievement test can construct their own: objective 
tests, then called the ‘new type’ of tests for classroom use. 

3.3. Third decade (1921-1930) | 
In 1924 Ruch (15) published the book ‘on informal objective 
itests with emphasis on teacher-made classroom tests. ı Test 


results were considered.aslonly one type of evidence of; pupils’ 
‘achievement which reflected the. tendency towards a broader 
‘scope of measurement than mere testing. Evaluation of all 


aspects of growth and the use: of wider variety of tools. and 
techniques characterised the informal Objective testing movement 
of Tyler. This period, therefore, highlighted the broader:concept 


-of measurement so as to encompass a wider area of assessment 


of pupils growth. More than 1,000 standardised tests had 
appeared -by this time.. Development ог statistical techniques 
received a lot of attention during this; period... 

ET Em 


wel кы Vi qd o 


6 Handbook of Pupil Evaluation 
3.4. Fourth decade (1931-1940) 


The contribution of Tyler (16) was felt more prominently іп the 
field of objective testing although he made significant contribu- 
tion to standardised testing also. He outlined the steps of 
construction and validation of tests by emphasising the instruc- 
tional objectives as basis for both. Recognition of pupil behavi- 
ours as indicators of attainment of instructional objectives or 
desired learning outcomes was his outstanding contribution. 
The extension of the scope of achievement testing to the more 
intangible instructional outcomes was a precursor to the broad 
modern concept of educational evaluation. The Cooperative 
Test Services began publishing a number of parallel forms of 
tests for secondary schools. Personality tests like those of 
research, interest inventories, attitude scales, sociometric 
devices and anecdotal records were introduced during this 
Period. By 1940, over 2,600 achievement tests were available in 
all traditional subject areas. During this period, the first Mental 
Measurement Year Book (1936) by O.K. Buros was published. 
In 1940 the National Council on Measurement in Education 
was established. 


3.5. Fifth decade (1941-1950) 


Whereas the last decade of the 1930’s was a period of extension 
and expansion, the 1940's were marked by maturing and гейпе- 
ment of tools and techniques. Big programmes like the Appraisal 
of Activity Programme (17) for the Elementary Schools and the 
famed Eight Year Study (1942) of 30 high schools (18) involved 
a comprehensive design. Under the direction of Tyler a series of 
instruments measuring students' ability to reason, apply, inter- 
pret etc. were developed under the Eight Year Study. Under 
the Appraisal of Activity Programme, basic skills in reading, 
language arts and arithmetic, map reading, using indexes, 
interpreting Charts, graphs and tables, interests, personal 
social adjustments, creative expression etc. were tested. 

The pupil evaluation movement is exemplified by the work 
of Lindquist and Tyler which not only indicated the neglect of 
significant areas of pupils’ behaviour but also led to the increas- 
ing emphasis on assessment of higher mental processes. In 1947, 


Examination Reform in Retrospect 7 


the Education Testing Service was established. This concludes 
the five decades of developments classified by scales. We may 
extend it to the remaining three decades to make the account 
up-to-date. 


3.6. Sixth decade (1951-1960) 


This period marks the post-independence era. The publication 
of a Technical Standard for Achievement Tests, AREA-NCME, 
in 1955, Bloom’s Taxonomy of Educational Objectives: Cogni- 
tive Domain (19) in 1956 and Project Talent (1959) to identify, 
develop and utilise human talents, were important landmarks in 
the history of educational measurement. The implications of 
Bloom’s Taxonomy in measurement and evaluation will be dealt 
with at a greater length while dealing with educational objectives. 


3.7. Seventh decade (1961-1970) 


The outstanding contribution during this period is that of the 
publication, Taxonomy of Educational Objectives: Affective 
Domain (1964) by Krath-Whol (20). This publication gave a 
new dimension to evaluating students’on appreciations, interests, 
attitudes, values etc. Its details are also discussed in the chapter 
on objectives. It was during this period that the first issue of 
the Journal of Educational Measurement (1964) was Published, 
providing, thereby a forum for discussion and exchange of 
researches in the field of measurement. A national assessment 
of educational progress also began during this period. A taxo- 
nomy in Psychomotor Domain was published in 1966 by 
Simpson (21) which provided the hierarchical classification of 
the goals of Psychomotor Domain. 


3.8. Eighth decade (1970-1978) 


Another taxonomy in Psychomotor domain devised by Harrow 
(22) in 1972 provided still another classification of psychomotor 
goals and a basis for measuring practical skills іп the field of 
sciences and practical arts. Not much work has been done in 
this area in term$ of hierarchical testing. During the seventies 
emphasis on identification and nurturing of talents continued to 


8 Handbook of Pupil Evaluation 


attract more and more attention. Ап important inroad has been 
made by the accountability aspect in the field of evaluation 
studies concerned with the various curriculum projects. The 
emphasis is shifting from the psychometric approach of evalua- 
tion in schools, to that of the system analysis approach in which 
more stress is being laid on opinion of personnel who use 
various tools and techniques rather than on the tools and 
techniques they use. Testing is only one part of evaluation. 


3.9. Changing foci 


Achievement testing is only one aspect of evaluation. Intelli- 
gence testing, aptitude testing, personality testing are other 
areas which have their own history of development. In the 
preceding pages a brief historical retrospect has been given 
relating mainly to the birth, infancy, adolescence and adulthood 
ofthe testing movement. Since the present compendium is 
focussed on the examination reform movement in India, further 
details about mental measurement is not called for. Before 
taking up the case history of the evaluation movement in India 
we may like to identify the changing emphases or foci of 
measurement as under: 


(i) Discrimination of persons on the basis of oral expression. 
(ii) Selection ог classification of persons on the basis of 
written ability. 
(iii) Survival skills as a basis of testing, for physical develop- 
ment. 


(iv) Written word in preference, to oral word for objective 

measurement. 

(V) Comparative performance for developing group norms. 
(vi) Empirical studies for standardisation of tests for wider use. 
(vii) Cognizance of intangible and tangible instructional 

objectives for teacher-made tests. 
(viii) Refinement of tools and techniques of testing 


(ix) Use of statistical methods in educational measurement. 
(x) Publication of tests, journals and tests. 


Two major trends that one can observe from these foci агг 


Examination Reform in Retrospect 9 


the broadening concept of measurement and the increasing con- 
cern for making measurement more and more scientific. 


4. Examination Reform in India 


4.1. Pre-independence developments 


(i) The Wood's Despatch (1854) у 

The Entrance examination started with the establishment of 
three universities in the presidencies of Bombay, Calcutta and 
Madras in 1857, the year of the Indian mutiny. This examina- 
tion was as а sequel to the recommendation of the famous 
Wood's Despatch (23) of 1854. These ‘Affiliating Type’ univer- 
sities at that time were founded to ascertain, by' means of 
examination, the persons who had acquired’ proficiency! in 

- different branches of learning and ‘rewarding them by academic 
degrees’, as evidence of their respective attainments. The main 
objective of this Entrance examination was to select, from high 
schools, the students who were suitable for admission to higher 
education. Thus the first public examination was instituted to 
certify students for entry into the universities. The courses of 
study being only academic and unrelated to life, the instruction 
was mainly dominated by the Matriculation examination (En- - 
trance) not only at the Secondary but even in Primary schools. 
As reported in the Review of Education in India (1886) the Ma- 
triculation examination led to six external examinations "om 
Matriculation downward to the lower primsry: 


(Gii) Hunter Commission (1882) 

This commission (24) made a valuable survey of the secondary 
schools of that time, and recommended “two types of avenues, 
one leading to the entrance examination and the other of a 
more practical nature leading to non-literary vocational persuits. 
` Secondary education on the basis of the grant-in-aid system 
was considered asan alternative to the growing. Cost of the 
expanding secondary education. But hardly any province adop- 
ted vocationalisation and education continued to be mostly 


. academic, completely dominated by the Matriculation examina- 
tions. 


10 Handbook of Pupil Evaluation 


(iii) Indian University Commission (1902) "T 
This Commission (25) reviewed the position of the universities. 
regarding the higher grades of examination. It pointed out that, 
“The greatest evil from which university education suffers in 
India is that teaching is subordinated to examination and not 
examination to teaching." In contrast to the tradition of general 
education in India, examinations started dominating the whole 
System of education to such an extent that they became instru- 
ments of general education. Assessment of grants to aided 
Schools by results, further aggravated the demoralising influence 

. of examinations on education. As reported in the Fifth Quinque- 
nnial Review (1902-1907) only two examinations, the Primary 
and Anglo-Vernacular were retained for the primary stage and 
the secondary Stage, to meet. partially the condemnation of the 
too many external examinations, 


(iv) Lord Curzon’s Resolution (1904) 

It suggested (26) introduction of alternative courses to meet the 
needs of students who are destined for commercial or industrial 
pursuits. As а Tesult, certain departments of education in 
Bombay, the Central Provinces and the United Provinces. 
instituted many schemes of school-final examinations. But these 
examinations could not be popular as the matriculation exami- 
nation continued to dominate as the scene and get preferential: 
treatment for admission to the universities. 


(v) Resolution of 1913 
From 1854 to 1913 there w 
Secondary education whi 
quality of output. This, 


as а phenomenal expansion ог 
ch resulted in the deterioration of the 
according to the 1913 Resolution Q7), 
affected the standard of higher education. This resolution 
appreciated the need of an external examination as a. means of 
maintaining standards on the on 


у 5 Togress and conduct of pupils, 
the use of written as well as oral examination, the strengthen- 


ing of supervisory staff and the need for making internal 
assessment a pre-requisite to Present candidates for external 
examinations, were the major recommendations. The combina- 
tion of external and internal assessment was considered as the: 


Examination Reform in Retrospect lL 


most important reform required in the secondary education, 
next only to the pay and prospects of teachers, according to 
this resolution. 


(vi) Calcutta University Commission (1917-1919) 

This commission popularly known as the Saddler Commission 
(28) opined that a mechanical system of marking encouraged 
memorisation and deteriorated mental development. The need 
for a small board of examination whose function should be to 
criticise and suggest improvement, was emphasised. It made the 
following important recommendations: 


(a) That an institution of Intermediate colleges be created to 
provide instruction in arts, science, medicine, engineering, 
teaching etc. This stage may be considered as a dividing 
line between the university and the secondary courses. 
Admission to the universities should be made after passing 
of the Intermediate examination. . 

(b) A Board of Secondary and Intermediate Education consis- 
ting of representatives of Government, University, High 
schools and Intermediate colleges be established to admini- 
ster and control the Secondary Education. 


The Intermediate Education got dissociated from the Univer- 
sity Education. Dacca, U.P., Panjab and Bihar universities 
gave this trial. U.P. created Board of High and Intermediate 
Education. Most of the other provinces did not take up and 
universities continued to control Intermediate examinations. 


(vii) Hartog Committee (1929) 


This Committee (29) reviewed the position of education in the 
country and suggested a diversified curricula. The Matriculation 
examination was found to still dominate the Secondary courses. 
The Committee recommended the retention of more boys 
intended for rural pursuits in the middle vernacul 
Diversification of more boys to the commercial and industrial 
careers by providing alternative courses was recommended in 


view of the high percentage of failures and rush to the univer- 
sity education. 


ат schools, 


12 Handbook of Pupil Evaluation 


(viii) The Sapru Committee (1934) 

Appointed by the U.P. government, this Committee (30) observed 
the practice of preparing students for examinations and degrees 
rather than for avocations. It recommended the diversification 
of courses at the secondary stage: the abolishing of Intermediate 
colleges with the introduction of 11 year of schooling as 
5--3--3 followed by 3 years of degree courses. The Abbot 
Wood Report (1936-37) recognised the problem of vocationali- 
Sation and suggested a complete hierarchy of vocational institu- 
tions parallel to the hierarchy of institutions of general 
education. 


(ix) The Sargent Report (1944) 

This was the post-war educational developments report (31) of 
the Central Advisory Board of Education. It visualised a system 
of universal, compulsory and free education for all boys and 
girls between the ages of 6-14, middle ог senior basic school 
being the terminal stage for the majority. Academic and 


Technical courses were recommended in the 11 year schooling 
proposed in this report. à 


2 1а 


4.2. Post-independence developments `` 


(i) Provincial Committees (1948) | 

The Acharya Narendra Dev Committee (1948) appointed by 
the U.P. government on reorganisation of primary and 
secondary education (31a) recommended the introduction of 
cumulative records, rules of promotions, use of written, oral 
and practical tests, use of intelligence and aptitude tests and the 
establishment of Bureaus of Examinations. At the same time 
in C.P. and Berar, a Committee (31b) on the reorganisation of 
secondary education recommended the following: 


(a) Classification of purpose of examination in each subject 
and its mode. , 
(b) Maintenance of records to. assess objectives, not testable 
through written examinations. . 
- (c) Valuing of scripts by two evaluators and of moderation 
procedures. 
(d) Careful selection of paper setters and examiners. 


Examination Reform in Retrospect 13 


(e) Use of written, oral, practical examinations and records of 
pupils. Tec Е 

(Е) Trial of symbolic system of marking except for the top: 
few. б 


(ii) University Education Commission (1948-49) 

This commission popularly known as the Radhákrishan Commi- 

ssion (32) recognised that examinations as they were, formed 

one of the worst features of Indian education because of their 

pernicious effects оп the whole teaching-learning process. The 
need was felt to make examination relevant to the ends in 
view. They were so much alarmed with their domination that 
the view was expressed that “if we are to make a single reco- 
mmendation for the whole university education it should be 
that of improvement of examination". The need for objective 
testing, appraisal of existing practices, appointment of full- 
fledged boards of examiners in each university, the construc- 
tion of batteries of psychological and achievement tests, the 
delinking of university degrees from administrative services and 
credit to classwork, were the major recommendations made by 
this Commission. A scheme of self-contained units of work and: 
allocation of one-third marks for course work for each subject 
in B.A./B.Sc., were also recommended. 


(iii) The Secondary Education Commission (1952-53) 
A diversified curriculum followed by a public examination at 
the end of 11 year schooling was the main viewpoint expressed. 


Other major recommendations of this Commission (33) were as 
under: ја 


(а) Reducing number of external examinations and the subjec- 
tivity of essay type questions by introducing objective. 
questions. 

(b) Maintenance of systematic records about pupils’ growth in 


all aspects during the whole year and credit to this internal, 
assessment in the final evaluation. 


(c) Use of symbolic rather than numerical marking in external 
as well as internal examinations. 


(d) Indicating internal test data and other records along with. 
the external examination results in the certificate awarded.. 


14 Handbook of Pupil Evaluation 


(е) Introduction of compartmental examinations at the final 
public examinations. 


(iv) U.P. Committee on Reorganisation of Secondary Education 
(1953) 

This committee (34) reflected on the invalidity, unreliability and 
Subjectivity in assessment in the single examination of small 
number of questions marked by huge number of examinations. 
They recommended replacement of external examinations by 
teachers’ assessments made - during the session. The need for 
coordination of standards of assessment of different teachers in 
a subject from different schools and the use of scaling was 
suggested in this record. 


(v) Report of the International Team (1954) 
This team suggested abolition of private tuitions. Any number 
of examination subjects may be taken like the G.C.E. of England 
and it may be spread over stages. For development of objective 
type tests the establishment of an All India Educational 
Research Centre was suggested by the team. 


(vi) All India Council for Secondary Education (1955) 
With the establishment of A.LC.S.E. its first meeting (35) was 
held on October 3-4, 1955 and it recommended the following: 


а) Secondary education boards should undertake research in 
this field by setting up a Research Unit. 

(b) Training colleges should work in close collaboration with 
the Boards and conduct research, especially for improving 
essay type questions. 

(c) A Committee should be set up to advise and report to the 
Council on the matter of examinations, 


As a sequel to these recommendations, a seven-men Com- 
mittee was set up. The terms of reference was as follows: 


(a) Improvement of written examination and the feasibility of 
the objective type questions. 


(b) Development of a scheme for improving practical examina- 
tions. 


a 


Examination Reform in Retrospect 15 


(c) Study evaluation examination criteria of States with a view 
to improving standards. 

(d) To determine usefulness of school records for selection of 
students to secondary schools. 

(е) To develop achievement tests by State Boards. 


(vii) Bhopal Seminar (1956) 

This seminar held in February 1956 is considered a landmark 
inthe history of Examination Reform. It recommended the 
setting up of a section in A.I.C.S.E. comprising members from 
the A.LC.S.E. and coopted experts to coordinate work of State 
Research Bureaus and to advise the Council in maintaining and 
improving standards. This section should be assisted by a 
Bureau of Examinations. The main function of this section 
envisaged by them was coordination, collection and dissemina- 
tion of information about improved tests and research work of 
the state bureaus besides advising on improvement of standards. 
An Examination Committee appointed by the Chairman of 
A.LC.S.E. met at Baroda in December 1956 and recommended 
the establishment of the Examination Unit. Major recommen- 
dations of the Bhopal Seminar were as follows: 


(a) Inclusion of short answer and objective type questions 


besides the essay type in the external and internal exami- 
nations. 


(b) Weightage of at least 20% to the sessional work based on 
cumulative records. 


(c) Establishment of a Bureau of Examination Research by 
each Board of Secondary Education which should work in 
close collaboration with the iraining colleges, university 
departments of education and other state agencies engaged 
in research. 


(d) Modification of the internal school examinations to ensure 
successful working external examinations. 


(viii) Visit of Dr. B.S. Bloom (1957-58) 
A.LC.S.E. was fortunate to get the ex 
renowned authority on examinations, 
Board of Examiners of the University 


Pert services of the world- 
Dr. B.S. Bloom, Head 
of Chicago. He visited 


16 Handbook of Pupil Evaluation 


Schools in India and met teachers, teacher educators and 
administrators and conducted six evaluation workshops in 
which about 200 teachers participated. In these workshops he 
emphasised the integral relationship of the educational objec- 
tives, learning experiences and evaluation procedure, that 
formed the guiding philosophy of the programme of Examina- 
tion Reform propounded by Dr. Bloom, Accordingly, a plan 
of action was formulated which formed the foundation of a 
countrywide movement of Examination reform. A ten-year 
phased programme was chalked out which included three main 
aspects: 


(a) Setting up of realistic, significant purposes and goals of 
learning. 

(b) In-service training of personnel to implement those pur- 
poses. Р 

(c) Development of internal and external evaluation procedures 
to serve those purposes. 


This phased programme envisaged that examinations could 
not be eliminated altogether (35a). It was visualised that 
changes in initial three to four years would be slow but were 
intended to bring a major reorientation and new attitudes. 
Evaluation procedures would always be somewhat in advance 
of the development of learning experiences. New type of 
questions would be introduced at first on a small scale and then 
would be gradually increased at the successive examinations. 
The detailed action plan was later on put before the first 
conference of the Chairmen and Secretaries for approval. 


(ix) First Conference of Chairmen and Secretaries (1957) 

This conference held at New Delhi in April 1957 (36) was the 
one where Dr. Bloom presented the full plan of action for 
scrutiny and acceptance. The following recommendations are 
worth mentioning in this connection: 


(a) General approval of the programme of action obtained by 
Bloom. 

(b) Endorsement of the view about evaluation as integral part 
of the total educational process. 


Examination Reform in Retrospect 17 


(c) Setting up an Examination Unit at the A.I.C.S.E. 

(d) Implementation by the state boards, the recommendations 
of the Bhopal Seminar relating to cumulative records, 
supply of achievement tests to schools, weightage to school 
assessment in external assessment and the use of objective 
type and short answer type questions in external examina- 
tions. 

(е) Setting up of research units by the boards to undertake 
studies on examinations. 

(f) Agreement on a 10-year phased programme of improvement 
in gradual stages. 


(x) Establishment of Pilot Examination Unit (1958) 

Consequent upon the recommendation of the first conference 
a Pilot Examination Unit was set up in. A.L.C.S.E. which later 
on became a part of the Directorate of Extension Programme 
of Secondary Education, better known as D.E.P.S.E. Five 
evaluation officers recruited for the unit were posted to five 
zones of the country at the Extension Services Centres. During 
five months 60 orientation workshops were organised for 1500 
teachers. Ten more officers of different subjects were sent to 
U.S.A. for training at Chicago under Dr. Bloom. On return 
they joined D.E.P.S.E. as Evaluation Officers. They were further 
groomed in India for one month by involving them in different 
seminars and workshops organised for the purpose. Two more 
regional workshops were held at Hyderabad and Chandigarh 
for selected teachers of secondary schools in eight subjects 
under the guidance of Dr. Bloom. It was assumed that external 
examinations in the country would continue and the use of 
objective based questions would bring educational improvement. 
The special role of teacher training colleges was emphasised in 
developing learning experiences, training of preservice teachers 
in evaluation and the need for inservice education of teachers 


in this field. The major functions of the Pilot Examination Unit 
were proposed as under: 


(a) To work with teachers and training college lecturers in 
order to help them to identify, clarify and enlarge instruc- 
tional objectives, develop learning experience and improve 


NS Handbook of Pupil Evaluation 


evaluation tools. 

(b) To work with Boards of Secondary Education to improve 
their paper setting and other examination practices. 

(с) To work intensively with selected schools to try new 
practices in classroom. 

(d) To train persons for State Evaluation Units. 

(e) To develop item pools in various subjects. 

(f) To develop tools for internal assessment. 

(g) To conduct research related to internal and external 
examinations. 

(h) To publish literature on educational evaluation, 


(xi) Second, Third and Fourth Conferences (1958, 1959, 1961) 


Among others, the following recommendations were made by 
these conferences (37). 


(a) Establishment of State Evaluation Units. 

(b) Development of a pool of test materials. 

(c) Including the new concept of evaluation in B.Ed./B.T. 
Syllabus as early as possible. 

(d) Organise short-term | courses in evaluation for trained 
teachers. 

(e) D.E.P.S.E. should give publicity to disseminate the new 
evaluation concept. 

(f) Setting up of a Central Bureau of Curriculum Develop- 
ment. 

(g) Two-level examinations in English and Mathematics. 


(h) Examine possibility of announcing performance of students 
rather than declare them as pass or fail. 


(i) Training of paper setters and examiners. 


(xii) NCERT and Examination Reform (1961) 
A number of departments of the Mi 
National Institute of Basic Educati 
Audiovisual Education, D.E.P.S.E., 
tional and Vocational Guidance, Ce 
Curriculum and Research ete, wW 
departments were created under th 
Council of Educational Research 


ntral Bureau of Textbooks, 
ere merged and some new 
e umbrella of the National 
and Training (N.C.E.R.T.) in 


Examination Reform in Retrospect 19 


September 1961. In another reorganisation, an independent 
unit, the Examination Reform Unit, came into being. Soon 
after, it became a part and parcel of the newly created bigger 
department named as Department of Curriculum and Evalua- 
tion. Even this department was later on abolished on the 
recommendation of the Nag Chaudhari Committee (38) 
appointed to review the work of the N.C.E.R.T. However, the 
new Department of Textbooks created in the N.C.E.R.T., 
continued work in the field of Examination Reform but with a 
much low zeal. In a yet another restructuring, the Department 
of Textbooks was reorganised with the setting up of two new 
parallel units pinched off from the parent Textbook Department 
and were renamed as the Examination Reform Unit and the 
Examination Research Unit. These units were again integrated 
into one department of Measurement and Evaluation including 
the sister unit of National Talent Search. With the last reorga- 
nisation of the N.C.E.R.T., Measurement, Evaluation, Talent 
Search unit and Survey unit were merged in 1984 to form a 
bigger Department of Measurement, Evaluation, Survey and 
Data Processing. 


(xiii) 5th, 6th, 7th and 8th Conferences (1963-68) 
Among others, following were the major recommendations of 
tthese conferences (39). 


Fifth (Delhi—1963) 

(a) To set up a seven-member Standing Committee to 
expedite Examination Reform Programme. 

(b) Each board should prepare a concrete plan of action to 
introduce reform in the existing system of examinations. 

(c) The Central Examination Unit to serve a clearing house 
function. 


Sixth (Poona—1964) 

(a) To take steps to improve practical examinations in science 
subjects and oral examinations in languages. 

(b) The boards should provide funds for training of paper 
‘setters, examiners and moderators. 

(c) The staff of the State Evaluation Units should be augment- 
ed and funds be prov ided for their effective working. 


20 Handbook of Pupil Evaluation 


Seventh (Mysore—1965) 

(a) To train pupil teachers of B.Ed./B.T. classes in improved 
evaluation procedures. 

(b) To prepare and circulate the paper on scaling procedures 
by the Central Examination Unit and a brochure on 
development and use of a library of test material. 

(c) The Central Examination Unit to coordinate the research 
activities of State Evaluation Units and the Boards of 
Secondary Education. 


Eighth (Ajmer—1967) 

(a) State Evaluation Units be attached to the Boards of 
Secondary Education. 

(b) Syllabi in different subjects should include specific instruc- 
tional objectives and detailed content outline. 

(c) Disparity of using cut scores by different boards to deter- 
mine pass or fail in a subject should be removed. 

(d) A State Evaluation Organisation as recommended by the 
Education Commission for each state, may not be created 
but the functions of State Evaluation Units be transferred 
to State Boards of School Education. 


(xiv) Education Commission (1964-66) 

This commission (40) considered evaluation as a continuous 
process integral to the total system of education which exercised 
a great influence on the study habits of students, instructions 


and mode of evaluation. The following recommendations are 
relevant to this paper: 


(a) To improve written examinations for making them. valid 
and reliable measure of educational achievement, 

(b) To devise tools and techniques for those 
growth which cannot be measured by written examinations. 

(c) At the lower primary stage, evaluation should help students 
to improve their achievement in basic skills and in the 
development of right habits. At the higher primary stage, 


use of oral tests, diagnostic tests and use of cumulative 
record be made. 


area of pupils' 


(d) There should not be external examinations at the end. of 


Examination Reform in Retrospect 


|| 
the primary stage. However, to: maintain, uniformity of 
standards, periodical surveys тау ђе conducted by district 
school authorities using refined tests prepared by the State 
Evaluation Organisation. 

(e) For inter-school comparability, a common external exami- 
nation may be held by district authorities using standardised 
or refined tests. This is besides the merit scholarship 
examinations. 

(f) A certificate be issued by the Board to indicate students” 
performance in different subjects without any reference to 
pass or fail. A student may be allowed to reappear in all 
or in one or more subjects to improve his performance. 

(g) Schools should issue a separate certificate showing the 
record of internal assessment as contained in a cumulative 
record. 

(h) A few selected experimental schools be established and 
given freedom to prescribe their own curriculum, textbooks 
and assess their own students and award certificates at the 
end of class X which may be considered equivalent to the 
Board certificate. However, the State Board will issue 
certificate on the recommendation of the schools. 

(i) Comprehensive internal assessment of all aspects of students’ 
growth should be undertaken and this should be descriptive 
as well as quantitative. 


(xv) National Policy on Education (1968) 

A parliament committee was set up to examine the Education 
Commission Report. According to the national policy on edu- 
cation developed by this committee, it was emphasised that, “А 
major goal of examination reform should be to improve the 
reliability and validity of examination and to make evaluation 
a continuous process aimed at helping the student to improve 
his level of achievement rather than certifying the quality of his 
performance at a given moment of time" (41). 


(xvi) V.K.R.V. Rao Committee on Examinations (1971) 

With the increase in malpractices and incidents of violence in 
external examinations, the Central Authority Advisory Board 
at its 35th meeting expressed its concern and tequested its 


8&,C.É.K.Y , west Bengal 
Dato, 29: 191 


Поља us 


02 Handbook of Pupil Evaluatiom 


Chairman to appoint a committce on examinations. A committee 
of seven members was appointed with Union Education Minister 
as Chairman. The other members were the Education Minister 
of Andhra Pradesh, (Vice Chairman); the Education Ministers 
of Assam and Bihar; the Chief Executive Councillor, Delhi, Mr. 
A.E.T. Barrow and Prof. S.V.C. Aiya, Director, NCERT as 
Member-Secretary. The committee had six meetings between 
August, 1970 and June, 1971. The major recommendations of 
this committee (42) were as under: 


(a) To enact legislation empowering examining authorities to 
take suitable measures regarding prohibition of weapons, 
assembly of persons, protection of invigilators etc. 

(b) To institute more than one examination where the number 
exceeds 10,000 school students or 1,000 college students. 

(c) Precautions for setting papers be taken well in advance by 
requiring model answers for them. Teachers should be free 
to comment on the paper after it is administered in the 
examination. 

(d) Method of spot evaluation at a central place should be 
adopted and results should be declared subject-wise and in 
the form of grades. 

(e) The certificate should indicate separately, in two columns, 
the results of public examination and that of internal 
assessment. 

(f) There should not be too many public examinations except 
опе at the end of.middle/upper primary; another at the end 
of the secondary and the third at the first degree college. 
All other assessments should be only internal. 

(g) Admission to colleges, including professional ones, and 
recruitment to services should be on the basis of selection 


tests. 

(h) Both Central and State Governments should earmark funds 
separately for research оп examinations which should be 
undertaken in a coordinated manner with the Boards and. 
Universities. 

(i) A novel idea of organising and conducting examinations 15 
Suggested. This includes multiple paper setting and their 
Use in place of one paper, involving practising teachers in 


А, 


Examination Reform in Retrospect 23 


papersetting by inviting questions from them with model 
answers and their.scrutiny by a board to finalise papers 
for public examinations; decentralisation of conduct of 
examination when there are more students and use of 
project, work for assessing creativity, initiative and special 
skills. 


(xvii) University Examination Reform Action Plan (1972) 

A working group set up by the Ministry of Education and 
Social Welfare brought out the document, ‘Examination Reform 
— Plan of Action' (43), and it was endorsed by the Commission 
at its meeting held on August 2, 1971. In 1974 the U.G.C. 
convened four zonal workshops relating to examination reform, 
autonomous colleges and postgraduate education, at Chandigarh, 
Madurai, Ahmedabad and Bhubaneshwar. A number of sugges- 
tions were made relating to internal assessment, question banks 
and grading system. Preliminary try out of internal assessment 
in unitary universities, the immediate feedback of results, 
opportunity to students to represent against the awarded grades, 
the maintenance of record for scrutiny, keeping internal and 
external evaluation marks separate, the development of question 
banks at each university and their effective use in paper setting, 
а seven point scale for grading, the mechanics of grading and 


their interpretation and setting up of a grievance machinery to 
redress students grievances, were the major recommendations. 


(xviii) Approach Paper on Ten-Year School (1975) 

The Ministry of Education and Social Welfare constituted an 
expert group in 1973 to develop the curriculum for the 10--2 
pattern of education. This group was further expanded їп 1974 
by including experts from the N.C.E.R.T. who drafted a version 
of the curriculum in 1972, revised in 1973. The final document 
called the Curriculum for the Ten-Year School—A Framework, 
was published in (1975) by the N.C.E.R.T. (44). This document 
is characterised by the compositness of its approach, flexibility 
within the framework of acceptable principles and values, the 
relevance of the disciplines of the curriculum, the unit approach 
to teaching and learning and using evaluation as feedback. The 
major suggestions relating to examination reform were as under: 


24 Handbook of Pupil Evaluation 


(a) Evaluation should give reliable evidence оп attainment of 
all instructional objectives by the use of a variety of tools 
and techniques. 

(b) Evaluation by teachers should be done continuously and 
the results feedback to pupils. 

(c) Passing all subjects at a time should not be insisted upon 
and a semester system may be adopted. 

(d) At the primary stage evaluation. should be integrated with 
the process of learning based ona continuous system of 
recording pupils’ progress using observation and oral tests; 
written and practical tests may be used at the middle and 
secondary stage in-addition to other tools and techniques. 

(e) There should be no ‘pass’ or ‘fail’ and students be graded 
оп a five-point rating scale. Students may be allowed to 
improve their grades, 


(f) A subject-wise/unit-wise record cf students’ performance 


in scholastic and non-scholastic areas should be kept 
regularly as part of the internal assessment. 

(g) The system of internal assessment should be strengthened 
and measured to reduce biases be taken till a stage comes 


when external examinations become redundant and are 
abolished. б 


(h) District Education Officers/Inspectors may set up a com- 
mittee to make sample checking of question papers and 
answer scripts to check biases and ensure proper evaluation. 


At the School-complex level and state level also, such 
committees could be established. 


(1) Community meetings may be held from time to time to 
acquaint them about the mode of evaluation adopted and 
Its use in improving pupils’ achievement. 


(xix) Approach Paper on Plus-Two Stage (1976) 


75 Approach Paper on the Ten- 

Paper was published by the 
N.CER.T. on Higher Secondary Patenti ana its Vocatio. 
nalisation (45) in 1976. This paper though it dealt mainly with 
the pattern of academic and Vocational courses at the plus-two 
Stage and other problems relating to Vocationalisation, did make 
Some recommendations on evaluation of the Courses. These were 


* 


Examination Reform їп Retrospect 25 


as under: 


(а) There is need for dispensing with the public examination 
atthe higher secondary level by adopting a continuous 
system of evaluation with a provision for checks: and 
balances. Effective. supervision would be needed to main- 
tain a high standard of performance. 

(b) A flexible system of two to six semesters be introduced and 
students’ performance on all these semesters be recorded 
on a result-card using a seven-point scale for grading. 

(c) There is need for establishing equivalence among vocational 
diplomas and certificates issued by various agencies. The 
National Council of Vocational Education should take up 
this work. 

(d) Since a student is expected to reach a particular level of 
competence to qualify for a certificate, students may be 
required to accumulate credit points for the course which 
is divided into various units or credit courses, requiring a 
different duration of time. Cumulative credit may be 
worked out subject-wise and a rule for getting a certain 
number of credits as a minimum requirement for passing the 
+2 stage may be framed. 


(xx) National Policy on Education (1986) 


In the N.P.E. evaluation process and examination reform 
envisages the following emphases: 


1. Assessment of performance is an integral part of any pro- 
cess of learning and teaching. As a part of sound educa- 
tional strategy, examinations should be employed to bring 
2bout qualitative improvements in education. , 

2. The objective will be to recast the examination system so as 
to ensure a method of assessment that is valid and reliable 
measure of student development and a powerful instrument 
for improving teaching and learning. In function terms it 
would mean: 


2.1. Elimination of excessive element of chance and subjecti- 
vity. 


26 Handbook of Pupil Evaluation: 


2.2. The de-emphasis of memorisation, : 

- Continuous and comprehensive evaluation that incor- 
porates both scholastic and non-scholastic aspects of 
education spread over the total span of instructional 
time. 

. Effective use of evaluation process by teachers, students 
and parents. 

2.5. Improvement in the conduct of examinations. 

2.6. The introduction of concomitant changes in instruc- 
tional materials and methodology. 

- Introduction of semester system from the secondary 
Stage in a phased manner. 

2.8. The use of grades in place of marks. 


The above goals are relevant both for external examinations 

and evaluation within educational institutions. Evaluation 

‚ at the institutional level will be streamlined and the pre- 
dominance of external examination reduced. 


5. The Changing Emphases 


The last one and a quarter century of the history of examina- 
tions in India jndicates a number of trends. Shifts in the 
emphases is indeed due to visualisation of a bro 
education and its Purposes on the one hand 
of examinations of the total ‘educational pro 


Education is no: longer equated with rote 
examinations are no lon 


learning. We may tr: 
during this period: 


ader view of 
and the influence 
cess on the other. 
memorisation and 
ger considered as the end of teaching- 
ace the following changes in the emphases 


(a) The concept of the traditional examin 
only with collecting the information abo 
ment is giving way toa 


ation which dealt 
ut pupils’ achieve- 


ing judgements and 
n-gathering process. 
assessment of pupil 
nts to include non- 


(с) The purpose of examinations is no longer limited to: 


= — — “ 


Examination Reform in Retrospect 27 


measurement of students’ lcarning but is extended to im- 
proving their learning. 

(d) The function of examination is not only to test for factual 
information but encompass testing of all the instructional 
objectives. 

(e) Instead of treating evaluation as an end ofthe course 
activity, it is being considered as an integral part of the 
teaching-learning process thereby ensuring a continuous 
process of students’ appraisal. 

(f) Besides written examination other tools and techniques 
like oral testing, practical tests, observations etc. are now 
increasingly considered indispensable for students’ evalua- 
tion. 

(g) More attention is being paid to refining thc measuring 
instruments to make them valid and reliable tools of 
evaluation. 

(h) There is a tendency to depend more and more on teachers’ 
evaluations with a focus on total internal assessment. 

(i) Increasing need is being felt to discourage irreversible 
judgements of failure and make provision for students to 
improve their performance. 

(j) There is aclear shift in the analysis of test results for 
diagnosing pupils’ difficulties in place of mere grading of 
their achievement. 

(k) There is a trend, especially in the measurement of non- 
scholastic traits, to use the five-point or seven-point grading 
system in place of numerical marking which is responsible 
for misclassification of students. 

(D A clear emphasis is observable on the feedback of results 
for reinforcement, improvement of students’ learning and 
the teaching-learning process. 

(m) Meaningful certification. of students' achievements is the 
visible trend. This is being sought for by reflecting a more 
comprehensive and multifacet growth of students as revealed 

by external evaluation and internal evaluation separately. 


A chronological summary of significant events in the history 
of examination reform is given in the annexure that follows, 
It is thus obvious from the historical development of measy- 


28 Handbook of Pupil Evaluation 


rement and evaluation that the need for integrating evaluation 
"with the educational process is being realised more and more. 
This means that it is the class teacher alone and not the exter- 
nal evaluator who can achieve this goal. He has to be made 
the main arbitrator of a student's appraisal. In this task, the 
role of his colleauges, students, supervisors and administrators 
hasto be appreciated. From examination to evaluation is à 
long distance to be traversed. A number of new ideas and 
concepts are involved. Unless the new terminology that is used 
in current literature is understood, it is difficult to appreciate 
the evaluative process. Therefore, a discussion on the basic 
pedagogical concepts in evaluation is necessary before explaining 
the process of evaluation. 


Үсаг 


ANNEXURE 


Chronology of Education Events 


Significant Recommendations 


Recommending 
agency 


кш ——ÉÁ—————— a Санаш 
а) (2 


1854 


1882 


1902 
1904 
1913 
1919 


1920 


1934 
1937 
1944 


1948 


1952 


1954 


1956 


1957 


Institution of the Entrance Examination as 
à first public examination. 

Suggestion for practical and academic 
courses, Domination of Matriculation Exa- 
mination. 

Assessment of grants to scools, by results. 


Many new schemes of examinations alter- 
native to Matriculation. 
Combination of external 
assessment. 

Creation of Intermediate colleges to act as 
basis for admission to university. 

To reduce incidence of failure and limit 
rush to universities.by provision of alter- 
native courses. 

Abolishion of Intermediate colleges and 
case for 11 year (5+3-+3) schooling. 
Hierarchy of vocational institutions 
parallel to hierarchy in general Education. 
Universal, compulsory and frce education 
for age group of 6-14. 

Improvement of examinations by appoint- 
ment of full-fledged board of examiners in 
each university and allocation of onc-third 
marks for classwork, 

Introduction of internal assessment data in 
the certificate along with external data and 
а case for use of symbolic grading. 

Setting up of All India Council of Secon- 
dary Education. 

Setting up of a section in A.LC.S.E. to 
promote rescarch work and examination 
reform programmes. 

Dr. Bloom's visit and development of ten- 
year phased programme of examination 
reform and its approval by the first con- 


and internal 


(3) 
Wood's Despatch 


Hunter Commission 
Indian University 
Commission 

Lord Curzon's 


Resolution 
Resolution of 1913 


Calcutta University 
Commission (Sadler). 
Hartog Committee 
Sapru Committee 
The Abbot Wood 
Sargeant Report 
Univesity Education 


Commission 


Secondary Education 
Commission 


Ministry of Education. 


Ehopal Seminar 


а 


Handbook of Pupil Evaluation 


c————————————— a: REGERE: 


(2) 


(3) 


кышы е DITE de ML ALL 


1963 


1964 


1966 


1968 


1971 


1974 


1975 


1976 


ference of Chairman and Secretaries of 
Secondary Education. 

Establishment of Pilot Examination Unit in 
A.LC.S.E. 

Establishment of the National Council of 
Educational Research and Training. Estab- 
lishment of the State Evaluation Units. 
Setting up a Standing Committee on , Exa- 
mination Reform Programme, 

Rajasthan Board of Secondary Education 
launches Examination Reform Project in 
collaboration with N.C.E.R.T. 
Comprehensive evaluation of students by 
using variety of tools and techniques in in- 
ternal assessment. Extermination of external 
examinations from the primary stage and 
use of periodic surveys or common external 
examination for comparability and mainte- 
nance of standards. 

National Policy on Education emphasising 
the improvement of quality of instruments 
and using evaluation for improvement of 
students' achievement. 

Review Committee recommends multiple- 
paper setting, spot evaluation of scripts and 


separate indications of internal assessment 
and public examination performance їп the 
certificate. 


Development of action plan for Examina- 


tion Reform at university level. 


Evaluation as feedback service and stren- 
gthening of internal assessment programmes 
for ultimate abolishing of external exami- 
nations. А 

Unit-wise flexible system of two to six 
semesters with seven-point grading proce- 
dure leading to cumulative credits forming 
the basis of promotions. 

Development of action plan for improving 
evaluation at the elementary stage. 


Integrated evaluation with teaching through 
continuous, comprehensive assessment in 
schools— Scaling of external examination 
marks, etc. " 


Ministry of Education 


N.C.E.R.T. and 
Rajasthan Board of 
chool Education 
Indian Education 
Commission 


Parliament Committee 


У.К.К.У. Rao 
Committee 


University Grant 
Commission 
N.C.E.R.T. Curriculum 
Framework for Ten- 
Year School 


Plus-Two Approach 
Paper, NCERT 


Examination Reform 
Unit N.C.E.R.T. 
National Policy on 
Education 


CHAPTER II 


CONTEMPORARY CONCEPTS IN 
EDUCATIONAL EVALUATION 


1. The Terminology Web 


Tn the field of educational evaluation India is one of the deve- 
loped countries. The work done during thelast two decades 
bears testimony to this statement. Some of the neighbouring 
countries like Sri Lanka, Nepal, Malaysia and Afghanistan, 
have already received a lot of technical help in this field and got 
their top planners in education trained by us through the 
N.C.E.R.T. in the concept and techniques of educational evalua- 
tion. In recent years, especially during the present decade more 
and more concern for evaluation is being shown. The reasons 
may be the shrinking resources, dissatisfaction with educational 
dividends, decentralisation in development and implementation 
of educational programmes besides some other. Accountability 
has become of far-cry of most of the new projects of curriculum 
and in-service education programmes. With the widening 
concept of education, variation in the context, resources, pro- 
cess, competence of personnel and availability of data gathering 
devices, a number of new mothodologies of evaluation are being 
used. This has led to the emergence of many new concepts in 
educational evaluation. In fact a jungle of terminology has 
grown which has made it difficult for teachers to comprehend and 
differentiate between some of the current terms in use. It causes 
confusion sometimes even to the educational administrators ang 
teacher educators. It is, therefore, desirable to identify the major 


32 Handbook of Pupil Evaluation 


pedagogical concepts relating to educational evaluation, define 
them and establish relationships among them. 

The terms in vogue are test, examination, assessment, apprai- 
sal, measurement, evaluation, diagnostic, formative and summa- 
tive; formal, non-formal and informal; internal, external and 
exo internal evaluation; rational; experimentaland illuminative; 
norm referenced, criterion referenced апа self-referenced; 
process, product and collateral; objective based, goal free and 
pay-off evaluation; qualitative, quantitative and responsive; 
micro, macro and mega evaluation; cognitive, affective and 
psychomotor; teacher based, standardised and programmed 
tests, self evaluation, participatory and teacher dominated; 
pupil, curriculum and programme evaluation. Let us take up 
related concepts one by one and understand their connotation, 
Scope and implications for the educational process. 


( 2. Examination, Measurement and Evaluation 


. These are the most commonly used concepts and are often 
msd interchangeably in educational parlance. The word 
examination’ has often been used as a tool or instrument rather 
than as a process. Examination indeed denotes a process of 
collecting evidence about pupils’ growth. It is a technique or a 
data-gathering device. As such it involves planning, construction 
of tools, administration, scoring of scripts etc. In this process 
one or more tools like a test ог a' question paper сап be used 
besides teachers! observation or even students’ product or 
Performance, to collect the needed information.) Take for exam- 
plean examination in practical biology. When we ask our 
students to examine the given specimen and identify, it involves 
à process of observing, using a microscope or lense, understan- 
ding of relationships among different parts, recalling characte- 
ristics of a class of animals or plants. So unless this information 
is got, whether through the naked eye, а lense, a microscope 
or by previous knowledge, one cannot identify the given speci- 
men. Ап examination is thus a process of gathering information 
about pupils’ achievement and not а tool of measurement. 


(Меази 3T | зы 
= rement is an act of ascertaining of pupils’ growth or 


Contemporary Concepts in Educational Evaluation 33 


achievement. It deals with the comparison of quality with an 
appropriate scale for the purpose of determining the numerical 
value on the scale that corresponds to the quantity to be measu- 
red. According to Lindquist (1), "measurement is an act of 
assigning numerals to ап object or an attribute according to 
certain rules." The numerals may be1,2, 3, 4, 5 or A, B. C, D, 
E or any other used for the purpose. Object or attribute to be 
measured may be achievement, physical efficiency, intelligence, 
aptitude or any personal or social quality.) Rules of assignment 
ofnumerals may befor example to give A to only 7% of the 
students ог to award full marks to each correct spelling. (Thus 
measurement may be quantitative or qualitative in nature. ) 
Sometimes the word ‘measurement’ and non-measurement are 
used in place of the above two terms. Чп any case this is a 
concept which deals with the ‘how much’ of a thing. To deter- 
mine this, one has to use some measuring device or tool which 
facilitates the collection of the needed information or to refine 
the act of measurement. ЈА thermometer, а lens, a scale, or a 
question paper are all tools of instruments of evaluation which 
help us to make measurement or bring precision to it. 

( Measurement is, therefore, a broader term than examination 
which is only one of the many other techniques or processes of 
collecting information like observation, interview etc., which 
are involved in the act of measurement.) Likewise, an examina- 
tion is a broader concept than a tool like, say, a question paper 
which is one of the many tools concerned with the gathering of 
information. 


Evaluation is a still broader concept than measurement. Accor- 
ding to Popham (2) systematic evaluation consists of formal 
assessment of the worth of an educational phenomena. It denotes 
formal attempt to collect information and to reach a judgement 
about assessment of a merit. Tenbrinks (3) dences “Evalua- 
tion as a process of obtaining information and using it to form 
judgements which in turn are to be used in decision making.” 
This term is used in close conjunction with measurement but it 
cannot be used interchangeably. Evaluation refers to the 
adequacy or worthwhileness of a performance. It involves 
value judgements which are based on measurement. The Process 


34 Handbook of Pupil Evaluation 


of determining the degree of adequacy depends heavily on the 
information obtained by formal measurement. Evaluation, 
therefore, involves value judgements which in turn аге formed 
on the basis of measurement (quantitative) or non-measurement 
(qualitative). Measurement is, therefore, a pre-requisite for any 
evaluation. There cannot be any evaluation without measure- 
ment. However measurement is not synonymous with evaluation. 
Whereas measurement deals with the ‘how much’ of a certain 
thing, evaluation goes a step further to tell us of “what value’ is 
that measurement? The relationship between the two is depicted 
clearly in a mathematical fashion by Gronlund (4) as under: 


Evaluation = Measurement + Value judgement 


Let us take an example to concretise our understanding of the 
two terms. Suppose Anjum got 56 marks in a class test in 

. English. The average performance of the class is 42. The highest 
score is 60 and the lowest score 15. A score of 56 is an index 
of her measured performance or her status. It does not indicate 
how good or how bad this score is. We have only measured 
and not found out the worth of that measurement. But on 
analysis when we find that she has secured the third position in 
the class and has a good performance with respect to the class, 
we are evaluating her performance, a value is being attached to 
it. If measurement is status determination, evaluation is worth 
determination. 

From the above explanation we can identify the relationship 
between examination, measurement and evaluation. Evaluation 
is a three-stage process. Information gathering through examina- 
tion or any other technique is the first stage. This information 
provides a score or letters or some sort of description which is 
the result of measurement (or non-measurement). This measure- 
ment provides the basis for forming judgements that reflects the 
worthwhileness or otherwise of the attribute measured. 


3. Test, Assessment and Appraisal 


According to Goods’ dictionary, a test is a device or procedure 
ш determining the truth, falsity, meaning of a hypothesis (—). 
\though the teachers are not aware of the type of hypothesis, 


"Contemporary Concepts in Educational Evaluation 35 


they generally try to test whenever a test is administered. 
Indeed they do test one or more of the hypotheses such as those 
relating to the development of the concepts, the attainability of 
the instructional objectives or the comparative achievement of 
two or more groups. A test is one tool of examination to collect 
a particular type of information about pupils’ growth. Other 
devices of examinations may be observation, inquiry, analysis 
or opinions. A test could be oral, written or practical depending 
upon the nature of hypotheses to be tested. 


Assessment is referred to as a teachers! estimate ofa value 
against a certain standard. It is, therefore, only an estimate, not 
an accurate observation of an attribute. This term might have 
intruded into the educational parlance from the field of taxation 
where only a rough estimate is made about the property in 
question. This term suits very well in the context of internal 
assessment where effective assessment of non-scholastic traits is 
generally approximate and cannot be measured accurately like 
‘cognitive abilities. 


Appraisal is another term which is used interchangeably with 
assessment and evaluation. It is generally defined as a formal 
and accurate valuation of some characteristics against some 
criteria usually made by persons familiar with such values (—). 
In contrast to estimation which can be informal and an appro- 
ximation, appraisal connotes well defined criteria against which 
one is to estimate the worth of an attribute or a characteristic 
with a view to sell the idea, thing or material. In a way it isa 
more refined way of estimating the value of a thing by using 
‘such criteria as are supposedly developed by knowledgeable 


people. 
4. Written, Oral and Practical Examinations 


The nature and scope of these three types of examination is 
determined by the mode of collecting information about pupils’ 
growth. If pupils’ responses could be gathered through the written 
word then we call these as written examinations. If the medium 
of students’ response is oral or aural then we term it as an oral 


36 Handbook of Pupil Evaluation 


examination. If the medium of testing is the performance of the 
students then practical examination is the type. Written exami- 
nation is normally undertaken with the help of paper and 
Pencil. It may take the form of a unit test, a question paper 
Or à written assignment of any kind. Since the written word is 
not necessary in case of such achievement tests where only 
diagrams are used as a medium, the use ofthe term, “paper- 
pencil," test seems to be more appropriate than the written 
examination. Most class tests and public examinations belong 
to this category of examinations. 

Certain instructional objectives are difficult to assess through 
the written word because we are interested in the measurement of 
skills like those of speaking, listening, reading and oral commu- 
nication. Here the medium of response is the oral word. This may 
take the form of conversation, dialogue, oral question or viva- 
Voce in case of science subjects. Things which cannot be tested 
through the written word can be tested through oral means. That 
Way oral examinations are more comprehensive in scope than 
written ones. But for the diagrammatic skills and written expres- 
sions all other objectives can be tested through these 
examinations. 

Apart from the written or the oral word there is the medium 
of actual performance of the individual which forms the basis. 
of examinations. Their scope is however limited to the assess- 
ment of psychomotor skills like those of observation, manipula- 
tion, dissection, collecttion etc. A practical examination indeed 
is the most reliable method of collecting evidences about pupils” 
growth as far as development of experimental skills are concer- 
ned. Their scope is limited to mainly science subjects, physical 
education, socially useful productive work etc. A well-thought 
out practical examination is the most comprehensive in that it not 
only involves testing of psychomotor skills but also some oral 
work (viva-voce) as well as some written work in the form of 
observational record and drawing work, A separate chapter is de- 
voted to the practical examinations for discussing further details. 


(s. Cognitive, Affective and Psychomotor Outcomes Evaluation 


Since Blooms’ taxonomy of educational objectives (5) these words. 
( 


"Contemporary Concepts in Educational Evaluation SE 


are more generously associated with teaching and testing. Cogni- 
tive evaluation refers tothe testing of those objectives which 
fall under the category of intellectual. development e.g. know- 
ledge, comprehension, application, analysis, synthesis and 
evaluation. These are thesix objectives classified as cognitive 
objectives. Thus testing for these objectives has come to be 
recognised as cognitive assessment or evaluation. 

| Similarly, the assessment in terms of affective objectives has 
been recognised as affective evaluation. It covers those objectives 
which are concerned with the development of attitudes, 
interests, appreciations and personal and social development. 
It is this area of pupils’ development which can be assessed 
better only through internal assessment ( However, these are 
techniques too technical to be used by teachers in the classroom. 
Psychomotor assessment relates to those objectives which 
require students to perform some activity or do some practical 
work. This involves assessment of the product of performance 
as well as process of performance.) Assessment of such area of 
development which involves motor activities or manipulation ог 
objects or performance of a task like that of a physical activity, 
production of a model, handling of a microscope etc. involves 
on-the-spot observation and assessment of performance of 
students. i 


( 6. External, Internal and Exo-internal Evaluation ) 


These terms are quite familiar to all teachers. However, there is 
definitely a confusion about their scope. At present external 
examinations are equated with only public examinations which 
become the basis of certification of students. All other exami- 
nations which come under the purview of the school are deemed 
as internal examinations. This is only one notion which is based 
on the agency conducting the examinations. From this it follows 
that an examination in a school, whether conducted by an 
outside person, or .a teacher who does not even teach that 
particular class or a section, are all internal assessments. This 
view does not seem to hold ground because in all these cases 
the examiner is not connected with the instructional process of 
students who are being evaluated. The criterion indeed is the 


38 Handbook of Pupil Evaluation 


knowledge on the part of the evaluator, of what has been 
taught апа how it has been taught. It is only the class teacher 
who teaches the subject is aware of this fact. If an examiner is- 
unaware of the unit objectives set in advance as also the learn- 
ing experiences provided’ to the students, he is not in a position 
to evaluate them properly. Therefore\ internal evaluation is one 
in which the teacher who teaches the students also evaluates 
them. All other types of évaluations, whether done in public or 
in schools, inside or outside the school, by a teacher of the 
same school or from a different school; aré all external evalua- 
tions; Thus the three criteria for an internal evaluation are: 
familiarity with the teaching-learning activities of the class, 
setting of the question paper by the class teacher teaching that 
class апа conduct of the examinaticn vis-a-vis evaluation of 
sctipts by the teacher himself as in the case of unit testing. 


Exo-inter nal evaluation envisages a combination of both exter- 
nal and internal assessments. )Гһіѕ contbination can be іп terms 
of the setting of the examination; conduct of the examination, 
evaluation of the scripts or for certification of students. The 
following are some of the possible practices in this regard. 


(a) Papers are set externally but evaluation of scripts is done 
internally by the schools as is done by the Central Board 
of Sécondary Education in subsidiary subjects like Sanskrit 

.. and Mathematics at the Higher Secondary stage. 

(b) Papers are set extertially, examination i$ conductéd exter- 
nally but the séripts аге evaluated internally as done in 
Some of the state examinations held at the end of the 
primary stage. 

(cy Papers are set internally by schools by teachers not teach- 
ing the class but scripts are evaluated by the class teachers 
as is the case іп many of the annual examinations upto 
classes IX. 

(d) Papers are set internally by the class teachers but evalua- 
tion of scripts is done by the teachers not teaching that 
class. 

(e) A team of class teachers sets the paper but evaluation is 
done either by class teacher or those who аге not involved. 


Contemporary Concepts in Educational Evaluation 39 


in teaching. 
(f) The paper is set by the class teacher of опе section and is 
evaluated by class teacher of the other section. 


There can be some more variations of this. As regards certi- 
fication, there are again different patterns. One is the certificate 
issued by the Board which contains only the performance of 
students in the external examination. Two; along with external 
assessment there 15 also mention of the internal assessment 
performance in the same certificate. Three; the Board issues 
certificates on the external performance while the school issues 
certificates showing performance on the internal assessment. 
Four; certificates are only given by schools on the basis of 
their own examinations conducted during the session and 
students are promoted or detained on the basis of their own 
certification but the certificate is issued under the seal of the 
Board. This is the case in the two autonomous schools of 
Rajasthan located at Udaipur, Vidya Bhawan and Banasthali 
Vidyapith. . 


\ 7. Formal, Non-formal and Informal Evaluation ) 


С Formal evaluation is thé conventional, ceremonious form of 
evaluation having an established mode of its conduct. Rigid, 
punctilious and set rules áré observed in carrying out formal 
evaluation, The formality may be in timing, frequency, mode of 
data collection, forming judgements or taking decisions) A public 
examination, a term test or à unit test are all part of the formal 
evaluation as also a standardised test. But the extent of forma- 
lity in terms of rigidity, preciseness or explicitness pertaining to 
timing and mode of conduct varies in all these tests. A stan- 
dardised test is more formalised than а public examination 
which in turn is more formal than a term test or a unit test. 

Formal evaluation presupposes à planned and highly. struc- 
tured process of evaluation. Its place in the system 1S institu- 
tionalised and represents the official mode in vogue. This form 
is applicable to both curricular and co-curricular areas of pupil 
evaluation. Its place of conduct and time is predetermined. 
Tools of evaluation are specified before hand. Data are collect- 


40 1 Handbook of Pupil Evaluation 


ed in aset form. Judgements are made in accordance with the 
established norms or tradition. The focus of such evaluation is 
on grading, classification and certification of students. Formal 
evaluation, therefore, refers to that mode of evaluation in 
which the various steps involved in the evaluative process are 
conventionalised and have acquired well defined and ritualised 
methodologies. 


| Non-formal evaluation seems to have evolved more recently in 
relation to non-formal education. Just as non-formal education 
has a-flexible frame of purposes, curricula, processes, instruction 
and evaluation, the term non-formal evaluation may be viewed 
in that very context. In contrast to formal evaluation, non- 
formal evaluation is neither institutionalised nor that well 
structured. ) It may be planned or unplanned but it is not 
ceremoniots and rigid in its operation. Its scope is not clearly 
delineated in advance but is adjustable to the situation and is 
thus need-based. Unlike formal tests, the non-formal tests ate 
$ timed according to the need and progress of the subgroups or 
individuals. Spacing of tests is not determinable in advance. 
For data collecting we have to depend on sources other than the 
teacher also. Judgement-making rests more on individual per- 
formance rather than class performance. Thus self-referenced 
Judgements are more valuable than class-referenced judgements. 
The purpose is mainly the improvement of individual's proficiency 
by regular feedback. This term may preferably be applied in 
case of such individuals or groups as are being educated 
through methods other than formalised schooling e.g. adult 
classes for functional literacy, correspondence courses, remedial 
classes etc. Non-formal evaluation, therefore, is unconventional, 
flexible, need-based and unceremonious in its purpose, scope, 
procedures and methodology of judgement-making. Its place in 
the traditional schooling seems to be out of tune and is difficult 
to justify unless close linkage between in the two systems is 


established. 


Informal evaluation)is as old as the reasoning power of man. 
election of appropriate jungle leaves for food, choice of a 
subject in school, rejecting a certain variety of fruit when com- 
pared to another, positive comment on a movie, making a 


we 


Contemporary Concepts in Educational Evaluation 41 


choice among different pieces of cloth for your garments, reflect- 
ing on Rita as a poor reader, saying, “it is not a good buy', are 
all examples of au informal evaluation. We encounter such 
examples in our daily life from morning till evening. (In all such 
evaluations one is neither conscious of the objective of evalua- 
tion nor is it a planned ог deliberate effort to evaluate. It is all 
incidental and totally unstructured.\ Its scope is unlimited and 
cannot be circumscribed or determined in advance. The place 
of evaluation which is fixed in both formal and non-formal 
evaluation, is indeterminable in informal evaluation. Its timing 
cannot be settled or regulated. The mode of data collection 
rests mainly on one's observation or previous experience. 
Therefore, judgement-making is generally experience-referenced. 
The focus of informal evaluation is either on choosing among 
the alternatives or on the reflective judgements of the observer. 

С In a school there аге a number of situations where informal 
evaluations are made by teachers about the studenis, fellow 
teachers and administration. Informal testing goes on in teach- 
ing through the use of oral questions.) An unannounced testing 
of students on a particular topic or a concept, assessment of 
unaccountable home or school assignments, correcting students’ 
pronunciation or intonation during reading aloud, rectifying 
students’ mistakes in composition or in the practical class and 
scores of such other situations are instances of informal evalua- 
tion. In fact informal evaluation is a part and parcel of a 
teacher's personality, who tends to evaluate inside the class, 
outside it, in the home, in the market. It is thus an unintended, 
untimed, unplanned and unstructured form of evaluation. If the 
frequency of class-room informal evaluations is increased and 
the teacher becomes conscious of the role of such evaluation, 
students can be relieved, to a great extent, ofthe anxiety, fear, 
and phobia of formal examinations. If the formal examinations 
are made more and more informal and conducted in a non- 
formal manner, teachers and not the students will be more 
afraid of preparing for such examinations. 


18. Summative, Formative and Diagnostic Evaluation 7 


"The forms ‘summative’ and ‘formative evaluation’ Were for the 


42 Handbook of Pupil Evaluation 


first time conceptualised by Michael Scriven (6) in his classic 
(1967) essay on the methodology of evaluation. According to 
him/ summative evaluation refers to the assessment or worth- 
whileness of the instructional programme which is completed 
while formative evaluation refers to the assessment or worth of 
the instructional programme which can still be modified. A 
summative evaluator gathers information and judges the merit 
of overall instructional sequence to retain or adapt that 
sequences The audience of summative evaluation is the con- 
sumer of instructional programme in contrast to the formative 


evaluator whose audience is the désigner and the developer of 


the programme. (A formative evaluator is a partisan of the 
instructional sequence and does every thing to make teaching 
or learning better. А summative évaluator is an uncommitted 
non-partisan person who is to pass judgement on an instruc- 
tional endeavour. Diagnostic evaluation is in a way intensive 


formative evaluation concerned mainly with the placement of. 


pupils in the instructional sequence so as to enable them to get 
the maximum benefit of the proper starting point for the pupils. 
For this, determining'the entry behaviour which is a pre-requisite 
to learning the new unit is necessary, as is the personality 
pattern of students in terms of their interests, attitudes, skills, 
attitude etc. Diagnostic evaluation provides answer to this. 

A very clear distinction is made between these three con- 
cepts by Bloom, Hastings and Madaus (7). Summative evalua- 
tion, according to them, is judgemental in nature. Its purpose 
is to appraise the end-products of the instructional efforts, 
whatever may be the teaching-teaching process. To distinguish 
it from formative evaluation, it is an end of the course activity 
concerned with assessment of the larger instructional, objectives 
of a course or a substantial chunk of the course. Our public 
examinations, annualand term tests are all summative tests 
used for making judgements about students' learning or achieve- 
ments.( The focus of summative evaluation is on measurement 
of pupils’ achievement and not on their improvement. Thus it 


is a status evaluation of students. \The major function is that of 


grading, promoting or certification of achievement. It may take 
Place at the end of a unit, term or a course of studies. Its 


©) 
mphasis į is generally on measurement of cognitive behaviours - 


Contemporary Concepts in Educational Evaluation 43 


sometimes on psychomotor and occasionally on effective be- 
haviours. Instrumentation is limited to final or summative 
examinations, through a weighted sample of course objectives. 
The avérage difficulty level of questions ranges from 35% to 
70%. Scoring, though normally norm-referenced, can also be 
criterion-referenced. Reporting of scores is by objectives. 
Summative evaluation is thus a judgemental activity focussed 
on certification of students’ achievement. 


Formative evaluation is developmental, not judgemental in 
nature. Its purpose is to improve students' learning and instruc- 
tion. Therefore, its major function is feedback to the teacher 
and students to locate errors in the teaching-learning process in 
order to improve it. It operates during instruction and 15 gene- 
rally limited to assessment of cognitive behaviours. УАП class- 
room assessments which are not used for grading purposes 
whether these are unit tests, informal tests, questioning during 
teaching; home assignments, teachers’ class-room observation of 
pupils’ responses аге all examples of formative evaluation. For, 
formal testing, specially designed instruments are devised. As 
for judgements or scoring it is criterion-referenced not norm- 
referenced as in’ summative evaluation. Decisions made relate 
to steps to be taken to improve the instructional programme 
vis-a-vis pupils’ learning: Reporting of pupils’ progress is done 
in terms of an individual pattern of pass-fail scores on different 
tasks in the hierarchy of learning outcomes. Formative evalua- 
tion is; therefore, a means of determining what the pupils have 
mastered and what is still to be mastered, thereby indicating 
the basis for improvement of students? learning. 


Diagnostic evaluation is in a way formative as it aims at dis- 
covering those areas of students’ weaknesses that hinder his 
progress. This could be due to the absence of requisite entry 
behaviour or level of mastery in a subject, inappropriate 
instructional mode etc. Accordingly, the major function of 
diagnostic evaluation is that of placement of students. 
Determining the presence or absence of pre-requisite know- 
ledge or skills, prior level of mastery, classification of students 
into different modes of instruction and causes of repeated 


44 Handbook of Pupil Evaluation 


learning difficulties all come under the purview of diagnostic 
evaluation. Its timing is related to placement at the commence- 
ment of a unit, term or an academic year. It has also a place 
during instruction when students repeatedly make mistakes and 
are not abie to profit from instruction. It is not only limited to 
cognitive, affective and psychomotor behaviours as far as 
emphasis is concerned, but it is also concerned with diagnosis 
of physical, psychological and environmental factors which 
result in the poor performance of the students. For instrumen- 
tation, ordinary formative and summative tests like unit tests or 
Teadiness tests can be used for pre-testing. Teacher made tests, 
observation, checklists, standardised achievement tests, and 
standardised diagnostic tests are also used. A sample of weight- 
ed course objectives and a specific sample of each pre-requisite 
behaviour or a sample of non-educational and environmental 
factors are taken for sampling purposes. In testing, large number 
of easy items of about 65% or higher difficulty level are used. 
Judgements made are both norm-referenced as well as criterion- 
referenced. Reporting is done as individual profiles by indicat- 
ing performance on subskills. Diagnostic evaluation is, therefore, 
focussed on identification of pre-requisite knowledge, skills and 
environmental factors which enable the teacher to place the 
pupils at the right ladder of the achievement continuum so as 
to enable him to achieve his maximum. If proper diagnostic 
evaluation is done at the appropriate time, it facilitates forma- 
tive evaluation to which it is complementary as well as supple- 
mentary in providing the needed evidence. If diagnostic 
evaluation reflects on the learnability of a student, formative 
‘evaluation provides clues to improve his learning while 
Summative evaluation certifies his achievement. 


9. Norm-referenced, Criterion-referenced and Self-referenced 
Measurement  ; 


Norm-referenced measurement is the traditional class-based 
assignment of numerals to the attribute being measured. It 
means that the measurement act relates to some norm, group or 
a typical performance. It is an attempt to interpret the test 
Tesults in terms of the performance of a certain group. This 


—— 


Contemporary Concepts in Educational Evaluation 45: 


group is a norm group because it serves as а referent or norm 
for making judgements. Test scores are neither interpreted in 
terms of an individual (self-referenced) nor in terms of a 
standard of performance or a pre-determined acceptable level of 
achievement called the criterion behaviours (eritecionsrefenenceq 
The measurement is made in terms of a class or any other 
norm group as the function is to relate individuals measure- 
ment to some norm group (class). The purpose is to produce 
response variance i.e. to see the extent to which an individual 
varies or differs from the performance of the group to which he: 
belongs or belongs not. 

Almost all our class-room tests, public examinations and 
standardised tests are norm-referenced as they are interpreted 
in terms of a particular class and judgements are formed with. 
reference to the class which is considered as a type. Who is the 
most intelligent boy in the class? Who stood first? Who got the 
least marks? Is he better than 50% of the students in the class?’ 
These are questions which involve norm-referenced judgements. 
Such judgements use the performance of some group as a 
referent on the same task. We compare an individual’s perform- 
ance without similar information about the performance of ' 
others. That is why selection decisions always depend on norm-- 
referenced judgements. Prediction and many placement decisions 
also depend on these types of judgements. A major requirement 
of norm-referenced judgements is that individuals being measur- 
ed and individuals forming the group or norm, are alike. The 
corditions under which the referent (norm) was obtained and 
the conditions under which the original information was. 
obtained are also assumed to be similar. Another criterion is 
that the referent used in norm-referenced judgements should 
have the minimum error so as to have reliable and accurate 
judgements. Unless the referent used is uptodate or recent, the: 
Comparison of an individual’s performance with the group (with 
an outdated referent) is of no use as it would lead to faulty 
interpretations. Thus norm-referenced measurement pre-sup- 


poses an uptodate, reliable referent (norm group) of Jike 
individuals obtained from like conditions. 


(А criterion-referenced measur i Ses 
asuren ; 
tent has its Origin In the 


46 ` Handbook of Pupil Evaluation 


writing of objectives by Mager (9) who urged the teachers to 
specify a Criterion of acceptable performance while stating 
instructional learning outcome, an intended level of proficiency 
of the learner ог a desired standard of performance. Thus in 
Contrast to a norm-referenced measure we can reference an 
individuals’ performance to a predetermined criterion which is 
Well defined. This type of measurement is termed as criterion- 
referenced measurement. It is determining an individuals’ status 
with reference to a well-defined criterion behaviour. lt is an 
attempt to interpret test results in terms of clearly-defined learn- 
ing outcomes which serve as referents (criteria). Success of 
criterion-referenced tests lies in the delineation of well defined 
evels of achievement which are usually specified in terms of 
behaviourally stated instructional objectives.) According to 
Glaser (1963) underlying the concept of measurement of 
achievement lies the notion of the continuum of knowledge 
acquisition ranging from no-proficiency to perfect performance. 
It is on this continuum of knowledge that an individual's status 
regarding his achievement is to be determined. Unlike an norm- 
referenced measurement, the criterion level of the minimum 
accéptable performance for each objective is specified in advance 
in criterion-referenced tests. 

Use of criterion-referenced measurement at the elementary 
‘stage where learning of basic skills and fundamental concepts is 
essential, is a must to a lay proper foundation for learning at 
the secondary stage. It does away with the unfair comparison 
-of an individual with other children. The mojor difficulty is the 
establishment of an achievement continuum in accordance with 
the complexity of the skills or the concepts involved. In fact, 
both norm-referenced and criterion-referenced measurements 
have a place in the teaching-learning process and the details are 
"discussed in a separate chapter on criterion-referenced tests. 


ЖО referenced measurements relate to the comparison of an 
individuals? performance with himself. Here the referent is 
Neither a norm group (class) nor the criterion bchaviour (expect- 
is Serformance) but the individual himself. It is an attempt to 
Fu test results in terms of the individuals’ own perfor- 

two different times. Therefore, the referent is his 


Contemporary Concepts in Educational Evaluation 47 


previous performance. If he scores more than his previous 
Scores he is improving dnd if he scores less, then his progress is 
not as desired. This progress can however be determined either 
in terms of the rate of the progress relating to the norm group - 
(class) or in terms of predecided or prespecified learning out- 
comes or performance objectives. In the former case his rank, for 
example, in the class or the deviation of his scores from the mean 
class performance indicates his progress or retardation. In the 
latter case his performance may be judged in terms of his compa- 
rative performance (now and then) on the instruction outcomes 
or level of proficiency in prespecified tasks orcriterion behaviours. 

(The basic principle of self-referenced judgements is the 
acceptance of the uniqueness of the child in his learning rate. 
‘Such judgements are valuable for the weaker section-of learners 
in preventing frustration and developing among them the sense 
of achievement) Ekta’s position is now 10th in the class while 
it was 15th in the last examination. She can now solve sums on 
quadratic equations which she was not able to do in the last 
term test. She has gone down in Biology when compared to her 
previous performance. These are all examples of self-referenced 
judgements which takes into account the individuals’ own 
capacities and efforts. All the three types of measurement—the 
norm-referenced, criterion-referenced and self-referenced are not 
mutually exclusive and should in fact be considered in the form 
of a triangle in which an individual is viewed differently from 
each angle to get the desired perspective. 


По. Quantitative, Qualitative and Responsive Evaluation ) 


The term Quantitative evaluation is widely used in education. 
In fact, the word ‘measurement’ is often equated with quantita- 
tive measurement. As such, quantitative evaluation refers to the 
assignment of numerals to an objéct of measurement according 
to certain rules. Here object means any attribute of the persona- 
lity of the individual. It is historically the most potent means of 
evaluating students whose "performance or achievement 15 
recorded in the form of scores which are some kind of numbers, 
Its wider use can be attributed to the facility it offers for 
manipulation of . numbers {ог comparison, selection, 


48 Handbook of Pupil Evaluation 


grading or certification. It is applicable to all curricular areas. 
(A major source of data for quantitative evaluation is testing, 
which may employ teacher-made or standardised tests for the 
collection of data. The scores or the marks which are normally 
called raw-scores are converted into various forms. These may 
be standard scores or any other type of comparable scores. 
Written examinations, unit tests, term tests, diagnostic tests, or 
any other form of testing, all yield data which are generally 
represented by numbers and is, therefore, amenable to mani- 
pulation. The validity and reliability of measuring instruments 
that yield numerical data can be more easily maintained or 
improved. Judgements which are formed on the basis of quan- 
titative evaluation are generally norm-referenced. Quantitative 
evaluation provides measurement data in such a form that it is 
easier to interpret and to take decisions regarding teaching or 
learning. A major limitations of quantitative measurement is the 
lack of needed reliability of the measuring instrument. If marks 
are not considered sacred and are interpreted in the light of the 
quality of measuring instrument, they can be more useful. Quan- 
titative evaluation is, therefore, the traditional mode of collect- 
ing evidences through various tests which yield numerical data 
that form the basis for judgement making and decision taking. 


Qualitative evaluation in contrast to quantitative evaluation, is 
that mode of appraisal of students’ achievement or growth 
which uses qualitative measurement or non-measurement as it 
is sometimes described. The object or the attribute of measure- 
ment in this type of evaluation are difficult to measure quanti- 
tatively. For example, personal and social qualities like those 
of intellectual honesty, civic sense, cooperativeness, patriotism, 
scientific attitudes etc., are some of the examples. which require 
the use of qualitative evaluation rather than quantitative 
evaluation. The basic principle underlying qualitative evalua- 
tion is the assumption that the attribute may be visualized only 
in terms of the degree or the extent of its achievement or growth 
on a continuum. \ may, therefore, be either in terms of a 
verbal descriptionf letter grading, graphical representation, 
Tank ete. Tools and techniques used in this type of evaluation 
of observation, rating scales, checklists, and such other instru- 


Contemporary Concepts in Educational Evaluation 49 


ments which do not yield scores. Therefore, qualitative evalua- 
tion yields subjective data in contrast to the objective data that 
accrue from quantitative evaluation. As for the methodology of 
judgement making, the data are compared to the criteria or the 
attribute comprised by each trait. Therefore it is, in a way, 
criterion-referenced judgement. There is no point in comparing 
an individual's attainment in the affective domain with his 
counterparts. It is a self-referenced judgement that may be 
made. A major limitation of a qualitative evaluation is the lack 
of a well developed criterion behaviour of each trait of measure- 
ment and the subjectivity of the evaluator that looms large in 
such an evaluation. The qualitative evaluation is thus an attempt 
to interpret the evidence about pupils’ growth in different fields 
in terms of degree, say, on a five-point scale against the criterion 
behaviours predetermined for the purpose. 


«Тһе term Responsive evaluation was used for the first time 
by Robert Stake. According to him, responsive evaluation 
is oriented more to the programme activity than to the pro- 
gramme intents and responds to audience requirement for 
information. It refers to different value prospectives in reporting 
the success or failure of the programme. The responsive evalua- 
tor engages different observers to observe and negotiate the 
programme. With the help of observers he prepares narratives, 
graphs, portrayals, etc. Using proper negotiations with the 
programme personnel, he checks up the accuracy of these 
portrayals and reactions with the audience members, which are 
recorded. Suitable media are used to enhance the fidelity of 
communication and the final written report may or may not be 
prepared as agreed by him and his clientele. 

Stake prefers to identify the issues instead of the objective 
or the hypotheses. According to him, issues reflect better the 
complexity, immediacy and valuing of the programme. The 
evaluator consults parents, teachers, students and other pro- 
gramme staff and identifies issues or potential problems. It is 
these issues that are basic to continue discussions, for gathering 
data, making observation, to conduct the interviews, test etc. 
All this is done to improve communication with the audience, 
The conventional research reporting is characterised by explicit 


50 Handbook of Pupil Evaluation 


communication, finding relationships among various variables, 
analysing and interpreting in terms of variance or co-variation. 
In contrast, responsive evaluation is an attempt to respond in a 
natural way to understand what the programme is like. The 
people involved have practical experience of assimilating and, 
therefore, more efficient comprehension. Sometimes direct 
experience may not b e possible and is substituted by vicarious 
experience similar to that which. members of the audience use. 

Thus, data collection is not limited to one person but is col- 
lected from all persons involved. The methodology of proces- 
sing the data is multi-way communication and discussion by the 
evaluator with the help of the observers. Reporting gives a 
holistic impression, portrays the complexity, the mode and the 
mystery of the experiences relating to the programme activities. 
The uncertainty and ambiguity of reports natural to such 
evaluation is its major limitation. The responsive evaluator may 
prepare portrayals of a just five-minute script or a long portrayal 
which involves many narratives, maps, graphs, taped conversa- 
tion, photographs, exhibits etc. Responsive evaluation is, there- 
fore, a new approach of cooperative enterprise in evaluating the 
activities of a programme through continuous discussion with 
the programme personnel and reporting of possible prospective 
of the success and failures of the programme. 


( 1. Rational, Experimental and Illuminative Evaluation 


К Rational evaluation is the basic form of all evaluation activities. 
It is the most commonly used and usable form of evaluation 
Which is based on the intelligent and enlightened opinion of the 
evaluator. It may also be called logical evaluation as the 
evaluator forms his judgement on some logic he uses as a basis. 
The major attributes of rational evaluation are reasoning, logic, 
experience, personal opinion and preferred values.)The evaluator 
uses his own rationale, depending upon the purpose of evalua- 
бод. It has no empirical base. It is a purely subjective 
appraisal of a pupil, curriculum ога Programme.) The objectivity 
lies in the relevance of the criteria adopted and the rigour with 
which those criteria are used. Since it is not possible always to 
agree on such criteria, unless Pre-specified or given, its objecti- 


а-у 


а: чи НЕ 


Contemporary Concepts in Educational Evaluation 51 


vity is ensured by involving more than one evaluator. 

| Rational evaluation is done by experts who are conversant 
with the nature of the task to be evaluated) As for its scope, 
this form of evaluation is applicable to almost all situations. 
Even in other types; initially the evaluation is done on a 
rational basis only. Take, for example, the relevance of instruc- 
tional objectives in science; the suitability of content or topics. 
in a history syllabus; the sequencing of units of teaching; the 
selection of a certain method to teach a. particular concept; to 
organise learning experiences around pupils’ interest, topics or 
activities; giving weightage to different forms of questions; 
estimating the difficulty level of a question; discouraging overall 
options etc. are all matters of rational evaluation to start with, 
although each of them can be empirically verified. 

АП the above examples indicate that fational evaluation is 
basic to all components of the teaching-learning process.) The 
basic tenet of rational evaluation is the finding out of the logical 
relevance of the material on the basis of experienced expertise. 
The source of data is the evaluator's own fund of knowledge 
and skills and the logic of the discipline. The methodology of 
data processing rests on the criteria evolved in advance. Judge- 
ments involve reflection on the predetermined nodal points. To 
what extent ends and means are articulated is the focus of 
rational evaluation. Its chief advantage is the economy of time 
and expert opinion. Its limitation is its subjectivity in forming 
judgements. As compared to other modes of evaluation this is 
by far the most popularly acclaimed method of evaluation used 
in most situations relating to the educational process. However, 
its wider applicability cannot be considered as an indicator of 


its greater validity or reliability. 


Experimental evaluation and illuminative evaluation have emerged 
from two distinct paradigms. Each has its own strategy, focus 
and assumptions. The first one which is more dominant is 
based on ‘classical’ or ‘Agricultural-Botany’ (3) paradigm derived 
from the experimental and mental testing tradition in psychology 
and it therefore utilises the hypothetico-deductive methodology. 
According to Parlett and Hamilton this type of evaluation (4) 
“students—rather like plant crops— аге given [retests ard. then 


52 ; Handbook of Pupil Evaluation 


submitted to different experiences. After a period of time, their 
attainment is measured to indicate the relative efficiency of the. 
methods used.’? This type of approach yields objective numeri- 
cal data that permit statistical analysis. Variables like I.Q. test 
scores, attitude rating, personality profiles etc. are codified and 
processed to indicate the efficiency of the new curricula, media 
or methods. 

Intelligencé testing, achievement testing, development of the 
attitude scales, interest inventories and personality inventories. 
are all examples of experimental evaluation. It involves controls. 
of variables in a predesigned experiment to meet the dictates of 
the paradigm. Testing, which takes place generally before and 
after the research design, is the prominent form of this type of 
evaluation. The evaluator himself is the source of data which 
are gathered by him mainly through written tests. A well set 
methodology is used to process the data. Judgements are 
generally norm-referenced. 

A major limitation of this type af evaluation is the difficulty 
of controlling variables, expensiveness, time-consuming data- 
collecting exercise, absence of provision for adapting the built- 
in premise according to changed cizcumstances, over reliance on. 
quantitative information, insensitivity of statistical generalisa- 
tions to local and unusual effects, failure to articulate the 
concern of participants, developers, sponsors and other interes- 
ted personnel. Experimentation evaluation is thus not vulnerable 
to manifold extraneous influences which narrow down its. 
claimed empirical reality and tends to make it more centralised. 
and bureaucratic in its decision making process. 


` Шиттапуе evaluation is based опа contrasting paradigm which 
differs from the classical paradigm in research style and the 
methodology of processing. It is called the Social-Anthropologi-- 
cal, psychiatory and participant observation. reszarch in socio- 
logy (5). The term illuminative research is drawn from Trow (6) 
(1970). This term takes care of the wider context in which a. 
Programme functions. (Description and interpretation rather- 
than measurement and Prediction is its concern. Its aim is to: 
Study how a programme Operates and is influenced by the 
Various po ratione attempts to document like a parti- 


"Contemporary Concepts in Educational Evaluation 53 


cipant in the scheme, besides discerning and discussing the 
‘significant features and recurring concomitants and critical 
processes. “‘Thus\it addresses and illuminates a complex array 
of questions which have bearing on the process of education 
and enlighten the innovator and other interested parties in 
identifying those elements, or procedures which seem to, have 
shad desirable results (у. 

The instructional system and the learning milleu are two 
concepts central to the understanding of illuminative evaluation. 
Evaluation concentrates on the ‘process’ of learning rather than 
on the ‘outcomes’ of learning. Those who are involved in the 
process and the conditions and constraints under which a 
project works in. the class-room, are the parameters of this type 
of evaluation. Therefore, the perception account of the learning 
milleaux is essentially like travellers’ tales (8). Its place in the 
'teaching-learning process is obvious especially in the non- 
‘scholastic area of pupil growth which demands evidence from 
all types of participants, students, teachers, parents, peers etc. 
The tools used are mainly observation, discussion, interview 
schedule, opinionnaire etc. The methodology of processing the 
data is subjective estimates and face-to-face discussion with the 
participants. Judgements are descriptive and consensus-based. 
A discussion would lack certainty because of the difficulty of 
the analysis of complex situations in the process of education. 
Tlluminative evaluation thus represents a sociological approach 
‘based on an appraisal ofthe complex interaction of various 
parameters having а bearing on the pupils’ learning and for- 
ming judgements on the basis of the consensus of all those who 
are participants in the teaching-learning process. 


\ 12, Objective-based, Goal-free and Pay-off Evaluation ) 


‘This tripartite classification is based on the criteria of the 
‘success of a programme. Evaluation theorists like Popham (12) 
classify various models of evaluation into three categories; the 
goal attainment models; the judgemental models emphasising 
extrinsic or intrinsic criteria and decision facilitation models. 


(утће Objective-based evaluation comes under the first category 


in which emphasis is on the determination of the degree to 
which an instructional objective is achieved. It is more ancient 


54 Handbook of Pupil Evaluation 


in lineage and is associated with the name of Ralph W. Tyler (14) 
as reflected in the well-known Eight-Year Study of the 
1930s. This approach envisages careful formulation of goals 
which are transformed into measurable ог behavioural 
objectives. Objective-based evaluation reflects the adequacy or 
inadequacy of the instructional programme, depending upon the 
degree to which the predetermined goals are achieved. This type 
of evaluation is widely used in class-room instruction and 
evaluation as also in many curriculum projects of today. 

The basis of data collection is the intended instructional 
objectives and, therefore, tools and techniques are also geared 
to collect data in terms of the pre-established objective or goals. 
Analysis is done in terms of objectives as also their interpreta- 
tion. The methodology of forming judgements and taking 
decisions rests on the level of attainment ofthe goals specified 
in advance. Its major limitation is the neglect of collateral 
efforts which may hinder the attainment of the objectives besides 
the validity of the objectives themselves, which is of considera- 
ble import. Nevertheless, the thrust of objective-based evalua- 


tion remains on the degree of attainment of pre-specified 
instruction goals. 


б Goal-free evaluation is the term coined by Michael Scriven (15) 
asa sequel to the inadequate explanation offered by goal 
attainment models of evaluation, for-reflecting on the programme 
whether intended or unanticipated. It fakes care of the results 
accomplished and is not concerned with™ the rhetoric of instruc- 
tional objectives, \It tries to ignore whatever knowledge of 
project goals that 1з likely to contaminate the programme effects. 

(it encourages the evaluator to take care of a wider range of 
outcomes of a programme than the objective-based evaluator 
who develops a ‘tunnel-vision’ due to: fecussing all the time on 
judging the attainability of the project goals. ) 

Scriven does not consider goal-free evaluation as a replace- 
ment of goal-based evaluation but a supplement to the objective- 
oriented framework. According to him, goal-free evaluation 
may function internally or externally. The goal-free internal 
evaluator, who is a member of the curriculum developer’s team, 
can assess the programme outcomes while the goal-free external 


Contemporary Concepts in Educational Evaluation 55 


evaluator who is uncommitted to the programme can also do 
summative evaluation. The Humanities Curriculum Project of 
U.K. directed by Stenhouse (16) is an example of goal-free 
evaluation: Although many of the Indian projects are goal-frec 
in their Origin, they are no longer concerned with the various 
programme outcomes, consequences or effects. 

The place of goal-free evaluation cannot be over-emphasised. 
Most informal class-room evaluations are goal-free and are 
concerned more with instructional effects than with instructional 
objectives. The source of data is not only the test scores in 
this case but also other tangible impact and collateral effects. 
Therefore, the opinion of students and other teachers matter a 
lot besides the formal evaluation. The processing of data and 
methodology of forming judgements isin accordance with the 
conditions, constraints and other factors which have a bearing 
on programme effects. The decisions relate not to the attain- 
ment of objectives but to the observed outcomes of the 
programme. A major limitation of goal-free evaluation is the 
lack of accepted criteria or goals against which the evidence 
collected could be validated. Thus goal-free evaluation should 
be considered as complementary to, rather than а substitution 
for, objective-based evaluation. 


Pay-off Evaluation is the outcome of the judgemental model 
of evaluation, which involves extrinsic or intrinsic criteria. In 
both, professional judgements are involved. The evaluator 
influences so much that it may turn out to bea favourable or 
unfavourable evaluation. Such judgements can use intrinsic 
criteria or extrinsic criteria. The former are gencrally referred 
to as process criteria and the latter as product criteria (not 
product evaluation). While evaluating a class, internal criteria 
may be the proper coverage of objectives and content or the 
weightage given to the form of questions. On the other hand, 
the extrinsic criteria may be the efficiency of the test as judged 
by students’ performance on the test when administered. This 
term ‘pay-off evaluation’ coined again by Scriven, is based on 
the judgemental model of evaluation involving extrinsic criteria. 
These criteria relate to the programme effects in contrast to the 
intrinsic criteria which are concerned with the internal charac- 
teristics of an instructional programme. According to him, 


56 Handtook of Pupil Evaluation 


intrinsic factors can also not be ignored and thus he is not 
disdainful of instrinsic evaluation. In fact, he suggests hybrid 
evaluation which combines both intrinsic and extrinsic evalua- 
tions. 

In most of the empirical investigations, extrinsic criteria 
form the basis of judgement and are therefore pay-off evaluation. 
Data are gatliered by personnel uncommitted to the programme. 
Decisions have to be taken in the light of criteria external to the 
programme. Compared to the other two types, pay-off evalua- 
tion may or may not be goal-oriented but this type of evalua- 
tion leans heavily on the assessment of dividends in terms of 
extrinsic criteria. 


fs. Product, Process and Product-cum-Process Evaluation 


The dichotomy of product and process seems to have originated 
in the science laboratory or, a wood workshop where perfor- 
mance of a task is involved; Product evaluation relates to finding 
out the worth of a production, result, outcome or a yield. To 
distinguish it from other types of produce, this term is restricted 
to the psychomotor effects or consequences. In other words, 
product evaluation refers to the’ appraisal of skill performance 
which yields some oa ga evaluation may be restricted 
to the completed product ór it may be extended to the assess- 
ment of techniques or the process of production. The product 
of performance and process of performance are two attributes 
of an end-product resulting from a psychomotor activity. 
Accordingly, the terms product evaluation and process evalua- 
tion are in use. However when both product and process arc to 
be assessed, we use the term product-cum-process evaluation. 
Such type of evaluation is applicable to practical examina- 
tions in science subjects and in assessment of performing arts 
like painting, dancing, sculpture, typing, wood-work, or socially 
useful, productive work. The product of performance шау be 
identification ofa slide Or a specimen or its correct classifica- 
tion, a piece of improvised apparatus, a summary or findings 
of an experiment, a completed painting or a piece of sculpture 
or an article of wood-work. For assessment purpose the three 
basic criteria are: the quality, the quantity and the speed. 
Accordingly, quality scales, rating scales or timing devices are 


Contemporary Concepts in Educational Evaluation 57 


used to collect the needed evidence about the product being 
appraised. When applied to the cognitive domain, product 
objectives are listed in terms of major concepts and generalisa- 
tions arrived at through the inquiry processes. 


frees evaluation refers to the various steps, procedures or 
inquiry processes undertaken to develop the product intended. 
It means the process of performance that leads to a particular 
product of performance) in the production of a model (Product) 
one might have to use various processes like the selection of 
appropriate tools and instruments, their efficient use, observa- 
tion say under the microscope, relating the observed with the 
intended, are the various procedural or operational steps 
representing the process of performance. The product is the 
model. Likewise A.A.A.S Plan (11) highlights the role or the 
processes of scientific inquiry in the understanding of science at 
the elementary stage. In the lower primary grades, the processes 
of sciences recommended are observing, using the space-time 
relationship, using numbers, measuring, classifying, communica- 
ting, predicting and inferring. In the upper primary grades, the 
processes recommended in addition, are formulating hypotheses, 
controlling variables, interpreting data, defining operationally 
and experimenting. These are termed as process objectives and 
evaluation in terms of process, objectives is called process 
evaluation. 

| The dichotomy of process and product is more apparent 
than real. In fact, conceptualisation of both as complementary 
to each other is necessary. Thus product-cum-process evaluation 
is the term coined for combined evaluation) This does not, 
however, mean that both have equal significance. What is 
important is the evaluation of both the process of performance 
as wellasthe product of performance. In certain cases, the 
process is more important than the product of performance. 
For example while analysing a salt to identify it, the procedural 
steps in sequence i.e. the process of performance is more impor- 
tant than mere identification of the salt. On the other hand a 
piece of model or a painting can be evaluated without any 
regard to the process used in making of these items because the 
quality depends on the efficiency of the process used. Likewise 


58 Handbook of Pupil Evaluation 


there are situations where both product and_ process are signifi- 
cant from the. point of view of evaluation. For example in 
preparing a slide for examination, we require specific sequential 
steps which must be appraised as also the final product i.e. the 
quality of the slide prepared. The time taken to yield an end- 
product and the quality of the product are essential for product 
evaluation. Process evaluation demands on-the-spot observation 
of the process of performance and in some cases even the 


written record supplements the evidence about efficiency of 
the process. 


For data collection use of a checklist can be made to record, 
on the basis of observation, the presence or absence of certain 


component skills or integrated skills. The methodology of 


judgement making is, by and large, criterion-referenced. Quality 
scales, rating scales or checklists used, presuppose the requisite 
quality or standard of performance against which the product- 
process can be appraised on the basis of comparison. Analysis 
of the processes of performance helps the teacher to locate 
areas or skills that cause difficulty in learning. It is on the basis 


of the process evaluation that the evaluator can pass a better 


judgement on the product of performance. Process evaluation. 
has a great diagnostic value on. the basis of which instruction: 


can be improved, as also the standard of performance of 
students. 


product evaluation is summative in its character. 
\ 
14. Micro, Macro and Elemental Evaluation ) 


( Micro evaluation represents that process of collecting evidences. 


and forming judgements that relate to a very small sample or 
Content or area of operation. A Major attribute of a micro- 


evaluation programme is its highly retricted sample which forms. 


the basis for evaluation) Like micro teaching, 
is a scaled-down evaluation encounte 
of the evaluation process is simplified 


can be done in two different ways, One, that all steps of evalua- 


tion may be taken to evaluate a product or process but the 
sample size is reduced to the minimu 


may take a small unit, 


micro-evaluation 
r in which the complexity 


Process evaluation is formative in nature while- 


in size and time. This. 


m. Forexample,a teacher 
à module or even а concept and then. 


Contemporary Concepts in Educational Evaluation 59 


evaluate the student on that portion only. Development of a 
concept can be tested at various levels of instructional objectives 
ranging from simple recall to the ability to hypothesise and 
evaluate. The other approach to evaluate students could be to 
take a particular objective and evaluate them on various be- 
havioural outcomes like ability to recall, recognise, translate, 
interpret, extrapolate etc. In its extreme form, only one specific 
intended learning outcome, say the ability to interpret, may be 
taken and a pupil's achievement guaged with respect to that 
specific objective only. 

( Micro evaluation is, therefore, a more intensive probe with 
respect to a limited sample of content or an objective. hen 
the time is short, the evaluator can judge the total sample on 
the basis of limited and similar sample. But it is, in fact, the 
teacher who needs more often micro-evaluation when he obser- 
ves repeated failure of students to understand a concept or 
achieve a particular objective. Such an evaluation is focussed 
more on feedback of results rather than on grading of achieve- 
ment. It is, therefore, formative in its approach. Data collec- 
tion is based on a comparatively more controlled variable and 
is, therefore, more reliable. Judgements are generally criterion- 
referenced and decisions are directed to improve learning and 
teaching. 


( Macro evaluation, sometimes called mega evaluation, is а large- 
scale evaluation effort which has extensive coverage of content, 
objective or sample size. It is an attempt to interpret data in 
terms of a large sample Extensiveness may vary vertically 
ranging from one class to-áll classes in a school, or it may be 
horizontal assessment of the curricular and non-scholastic traits. 
of students from a school or a state. ( Macro-evaluation, unlike 
micro-evaluation, provides a megascopic view of students’ 
growth.) Public examinations, standardised tests, internal 
assessment programmes, a crash programme for evaluating text- 
books in various subjects, are examples of different kinds of 
macro-evaluation. The focus of such evaluation is on measure- 
ment of growth and the judgements formed are generally 
summative in nature.\Decision focus is norm-referenced com- 
parison or status position. 


60 Handbook of Pupil Evaluation 


In fact, extensiveness or intensiveness of evaluation effort 
in micro and macro evaluation can be viewed on a continuum 
ranging from measurement of a single concept and ability in 
class (micro) to that of a nationwide programme of evaluation 
like that of the Talent Search Examination of the N.C.E.R.T. or 
the U.P.S.C. Central Services Examinations. The purpose varies 
from diagnosis and improvement of learning to that of grading 
and certification of students, curriculum or a project as we move 
from one end of micro-evaluation to the other end of macro- 
evaluation. 


(Elemental evaluation is an attempt to form judgements about 

`a single but specific and predetermined element or attribute of 
a person, text, curriculum or a programme. For example, a 
history textbook may be evaluated only from the point of view 
of national integration or a science text from the point of view 
of investigatory approach), Likewise, a curriculum can be eva- 
luated from the point of view of the requirement of the learner 
or social needs etc. A programme may be evaluated in terms of 
its context, input, process, product impact or collateral effects. 
Similarly, pupils’ evaluation may be done only to judge their 
civic sense, cooperativeness, expression, handwriting or spellings. 
The focus is only on one element or component of learning, 
curriculum, text, a programme or a project. 

Elemental evaluation differs from micro-evaluation in that 
it may give a microscopic or megascopic report depending upon 
the sample used. The one refers to the scope aspect of evalua- 
tion which may be elemental or comprehensive, whereas the 
other refers to the nature and scope of the evaluative effort to 
be made with respect to that aspect. Basic to elemental evalua- 
Чор is the identification of the expected behavioural outcome 
ie. the criterion behaviours or the indicators of success. This 
type of evaluation can be undertaken, depending upon the 
preferred objectives, values, aspects or elements. Elemental 
evaluation is thus an attempt to interpret the evidence about a 
person, programme ог a project in terms of a single aspect 
about which judgements are formed and decisions are taken. 


Contemporary Concepts in Educational Evaluation 61 


15. Programme Evaluation, Curriculum Evaluation and 
Pupil Evaiuation 


Programme cvaluation refers to all the activities or steps taken 
to judge the effectiveness of a programme or a project. A 
programme may be a project only or it may consist of more 
than one project in the total programme. \A programme of non- 
formal education may for example be composed of different 
projects like functional literacy, adult education, development 
of literature for the neo-literate, evaluation of pupils’ growth 
etc. А programme is, therefore, to be evaluated in terms of its 
objectives as well as the means adopted for the achievement of 
those objectives? Since every programme is conceived in its own 
context and is bórn in its own soil, it has to be evaluated in the 
same context. Likewise. inputs provided, procedures adopted 
for implementation, materials prepared, outcomes or products 
accrued and impact made, are the different aspects which come, 
under the purview of programmes evaluation. Thus terms like 
context evaluation, input evaluation, process evaluation, out- 
come evaluation, product evaluation, impact evaluation, 
materials evaluation, curriculum evaluation, pupil evaluation: 
etc. are all used under the umbrella of programme evaluation. 
A programme evaluation, therefore, may include curriculum 
evaluation as well as pupil evaluation because one of the 
criteria of effectiveness of a programme may be the effectiveness. 
of the curriculum or the pupils' growth as a result of programme 
impact. 

Data collection involves all the four major techniques, 
namely observation, opinion, analysis and testing. Methodology 
of judgement-making varies with the programme intents or with 
reference to a comparative programme. Decisions generally 
lead to programme acceptance, programme modification or 


programme termination. 


(s Curriculum evaluation is a part of programme evaluation or an 
independent project when the programme itself is of developing 
a curriculum. Since curriculum is a comprehensive term used 
to include curriculum objective, curriculum content, curriculum 
methodology and evaluation, it involves evaluation of the objec- 
tives syllabus, textual material or other materials, methodology of 


CHAPTER Ш 


BASIC STATISTICAL CONCEPTS 
IN MEASUREMENT 


Statistics is a basic tool of measurement, evaluation and 
research concerned with gathering, analysing and interpreting 
numerical data by using various processes or mathematical 
techniques. This word is also used to describe the collected 
numerical data. On the basis of individual observations, group 
characteristics are abstracted to form generalisations by 
means of statistical data. The average family income, the 
average 6th grade boy is 11 years old, the average marks 
secured in Mathematics by 8th class students are all statistical 
concepts. For a research worker, there is need for identifying 
the facts relevant to the questions or the hypothesis to be tested, 
the mode of collecting, organising and analysing observations, 
assumptions underlying the methodology of processing data and 
the validity of conclusions. Thus systematic observations and 
description of characteristics of objects and events, finding rela- 
tionships between various variables, making generalisations and 
predicting future occurrences are basic to research. Measurement 
is the universally accepted and most precise method of describ- 
ing and assigning quantitative values to the characteristics of 
objects and events. 

It is expected that after going through this chapter the 
reader will be able to 


(a) differentiate between different types of scales, parametric 
and non-parametric data, descriptive and inferential statis- 
tics etc. 

(b) organised data into frequency distribution, cumulative 


фр" 


i^ "> 


— эб 


Basic Statistical Concepts in Measurement 65 


frequencies etc. for making scores meaningful; 

(c) calculate measures of central tendencies, standard deviation, 
variance, relationships like co-efficient of correlation and 
use them to interpret measurement data; 

(а) understand the concept of normal probability curve апа its 
application; 

(е) compute reliability co-efficient using different methods, the 
standard error of measurement and standard error of 
estimate; 

(f) appreciate the role of basic statistical concepts in under- ` 
standing measurement data and their meaningful interpreta- 
tion for comprehending the process of measurement and 
evaluation. 


1. Measurement Levels 


The nature of variables and the- precision of measuring instru- 
ments determine the level of precision and sophistication of the 
measure or the scales. Following are the four scales in use: 


1.1. Nominal Scale 


For example in an office we have: 


Female Male Total 
Lower division clerks 29 9 38 
Upper division clerks 15 5 20 
Assistants 10 3 13 
Superintendents 04 2 06 
Section Officers 02 1 03 
Total 60 20 80 


-—Objects (Persons) are categorised. 

— Each individual can be a member of one set only. 

—All members of the same set have same define 
ristics. 

—They are non-orderable i.e. cannot be ranked. 

— Counting is the only feasible method of quan 


statistical analysis. 


d charac- 


tification for 


66 Handbook of Pupil Evaluation 


Such a scale which indicates or describes difference between 
things by assigning them into nationalities, genders, educational 
levels, occupations etc. is called a nominal scale. 


1.2. Ordinal Scale 


For example the height of a group of girls is as under: 


Girls Height Difference , Rank 
in cms. in cms, К 

Rita 160 = Ist 

Nita 158 2 2nd 

Sita 155 3 3rd 

Gita 150 5 4th 

Mita 142 8 Sth 


—Ordering of things into more than or less than adjacent 
things. 


— Ordering is expressed in terms of rank in the group. 
—No absolute values and the difference in adjacent ranks may 
not be cqual. 


—Ranking spaces them equally but they may not be equally 
spaced, 


Such a scale which not only indicates that things differ but 


also that they differ in amount or degree is called an ordinal 
scale. 


1.3. Interval Scale 


It is based on equal units of+measurements. It indicates the 
magnitude of a given characteristic or property. A difference 
between a score of 75 and 76 is assumed to be equivalent to the 
difference between 50 and 51 and 16and 17. Since it cannot 
measure the complete absence of a trait, its main limitation is 


the lack of true zero. Psychological tests and inventories are 
interval scales. 


1.4. Ratio Scales 


Like the interval scale it has equal interval property. In addi- 


Basic Statistical Concepts in Measurement 67 


tion it has a true zero. For example the zero point on a centi- 
meter scale indicates complete absence of length or height. 
Numerals of ratio scale can be added, subtracted, multiplied, 


divided and expressed into relationships e.g. 2 gms. is one half 
of 4 gms. or twice the 1 g. 


2. Data 


2.1. Parametric Data 


ltis measured data. Parametric statistical tests assume that 
data are normally or nearly normally distributed and are appli- 
cable to both interval and ratio scaled data. 


2.2. Non-parametric Data 


This type of data are either counted or ranked. They are also 
called distribution-free tests and do not rest on the assumption 


of normally distributed populations. To summarise: 


(a) Level 1 2 3 4 
(b) Scale Nominal Ordinal Interval Ratio 
(c) Process Counted Ranked Measured Measured 
and , in ordcr equal inter- equal intervals 
classified nal, No true True Zero 
zero. Ratio rela- 
tionship 
(d) Data Non- Non- 
Treatment Parametric Parametric Parametric Parametric 


(e) Appropri- Chi-Square Spearman's t-test, analysis of variance 


ate test Median rho (о) analysis of covariance, 
Sign Mann- Factor analysis, Pearson's r 
Whitney 
Wilcoxon 


3. Descriptive and Inferential Analysis 


When statistical analysis is limited to generalisations applicable 
to a particular group of individuals observed, it is descriptive 
analysis. Data describe one group and that group only and no 
conclusions are extended beyond this group. Simple action 


research is an example of this type. 


68 \ Handbook of Pupil Evaluation 


When analysis involves a sampling process to select a. small 
group which is assumed to be related to a larger group from 
which the small group is drawn, it is inferential analysis. The 
small group is called the sample while the large group is called 
population. Thus drawing conclusions about population on the 


basis of observations of the samples js the purpose of inferential 
analysis. 


4. Statistic 


When statistic is used as a singular it is a measure based on 
Observations of the characteristic of a sample (mean, standard 
deviation etc.). A statistic computed from a sample may be 
used to estimate a parameter, the term used to connote a value 
corresponding to the population from which the sample is 
drawn. 

A sample must, therefore, approximate to the larger group 
ie. the population. Only then it is possible to estimate the 
characteristics of the population by analysing the characteristics 
of the sample. If sampling is not done carefully, the research 
worker should restrict the findings to the group observed and 
Should not apply them to other individuals or grours. The 
statistical theory of sampling is complex and the details ате not 
warranted in this introductory treatise. 


5. Organising Data for Making Scores Meaningful 


Let us take the following scores of 10 students written either 
Toll-number-wise or alphabetically. 


1. Amita = 31 6. Lalita = 
УНА = 16 7. Manjeeta = 29 
3. Babita = 35 8. Ranjeeta = 35 
4. Kavita = 28 9. Sarita = 26 
5. Gita = 38 10. Vanita = 26 


= EE a ee 


From this type of data it isnot convenient to know the highest 


as „lowest score, average score and how the score of Kavita 
?8) will compare with others in her class 


Basic Statistical Concepts in Measurement 69 
5.1. The Array 


The same scores when arranged in say, descending order, as 
38, 36, 35, 35, 31, 29, 28, 26, 26, 16, become more meaningful 
and at a glance we can know the highest score (38), the lowest 
score (16), the average score (30) and the performance of 
Kavita with 28 marks (just below average). 


5.2. The Rank 


These scores can now be given ranks as Т, 2,3,......... ...10 and 
the teacher can assign ranks without giving tbe raw scores. 
When two scores are the same then the ranks in this case can be 
calculated by dividing the summed up ranks of the two students 
and giving the same rank to both the students as under: 

Score: 38 36 35 35 31 29 28 26 26 16. 

Ran: 1 23535 5 6 7 85 8&5 10, 


5.3. Frequency Distribution (F.D.) 


When the scores are more than 20 it becomes difficult to rank 
and is time consuming. In such cases score are grouped into 
intervals of 2, 3, 5 or 10 depending upon the range and number 
of scores. Scores are then tallied in each interval. Tallies are 
counted and the number is put under the frequency column. 
The sum of the tallies/frequencies should equal the number of 
scores. The following example illustrates the frequency distri- 
bution of 20 students’ scores on a unit test in Biology. 

Scores: 38, 21, 35, 39, 31, 33, 32, 26, 29, 28, 

24, 34, 25, 32, 30, 33, 31, 30, 28, 23 


TABLE 1: Frequency distribution of scores of 20 students in Biology 
Interval Tallies Frequency 


HANNU AW ——— 


70 Handbook of Pupil Evaluation 


Steps 

(а) Number of intervals сап be set between 20 and 40 by 
dividing the range (39—21—18) by 10 to estimate the size 
of interval 18+10=1.8. Therefore, in this case an interval 
of 2 was used. 

(b) Establish the highest interval (39-40) and other intervals 
down through the lowest interval containing lowest scores 
(21-22). 

(c) Tally the individual scores in each interval. Count the 
tallies and write the number in the frequency column. 

(d) Add the frequency column to check that the sum equals 
the number of scores in the collection. 


A score interval of odd number of units may be preferred to 
even units because mid-point is а whole number and not a 
fraction and is, therefore, convenient for computation since all 
Scores аге assumed to fall at the mid-point. 


Example: Interval of 6 (even) 7, 8, 9, 10, 11, 12 (mid-point 9.5) 
Interval of 5 (odd) 7, 8, 9, 10, 11 (mid-point 9.0). 


6. Statistical Measures 


To describe and analyse the data in a meaningful way you 
should be able: 


l. to calculate the measures of 
(а) central tendencies or averages like mean, median and 


mode, 


(b) dispersion or spread like deviations, variance, and stan- 
dard deviation, 


(c) relationship like that of coefficient of correlation. 


2. to use these measures tot 


: 5 he given data for meaningful 
interpretation of test results, 


6.1. Measures of Averages (Central Tendencies) 


(i) Mean (M) 
Arithmetic average, grade point average and Arithmetic mean, 


Basic Statistical Concepts in Measurement 71 


all represent the mean value and are very useful in classroom 
measurement. A widely used index of average is the Arithmetic 
mean which is calculated by adding the separate scores and 
dividing the sum by the number of individual scores in the 
collection. 


Ungrouped data 
Sum of scores 


Ууу 
= 


Меап ды 

Number of scores of N (D 

where EX ==5шт of scores in a distribution 

In Table 1 sum of 20 scores is 602. 

Thus N = Number of scores, and ХХ — 602. 

602. 
Mean — 720 730.1 
TABLE 2: Calculation of Mean from Grouped data 
Interval X (Mid-point) f (Frequency) fX 
39—40 39.5 1 39.5 
37—38 37:5 "T EWES) 
35—36 35.5 1 35.5 
33—34 33.5 F) 100.5 
31—32 31.5 4 126.0 
29—30 29.5 3 88.5 
27—28 27.5 2 55.0 
25—26 25.2 2 51.0 
23—24 23:5 2 47.0 
21—22 21 1 21.5 
N-20 У(Х=602,0 
XIX 602 
=M= =~ =30.1 (2) 

Mean N 20 


Here, by chance, the mean is the same for grouped and un- 
grouped data, Generally, there is small difference between the 
means calculated from both grouped and ungrouped data 
methods. The concept of the mean is useful not only in under- 
standing the performance of a group but also because it is used 
as a basic statistic to compute other statistics like standard 


deviation etc. 


72 Handbook of Pupil Evaluation 
(ii) Median (Md) 


Median is a score point (not necessarily а score) in an array, 
below and above which one half of the scores fall. It isa 
measure of position rather than of magnitude and can be found 
without calculating, by inspection if the scores are arranged in 
ascending or descending order. A median can be calculated 
as under: 


(a) If the number of scores is odd: 
Use formula: 
Median- Ма = Mid-sccre (3) 
N+] А 
=( = је Score is an array. 


Exaniple 
Scores 
5 2 Scores above 
: | г ; below 
NNNM dian OR 5+1 3 the 3rd score from be 
x $ 2 XE. 3; 


e Y 


1 2 Scores below 


(b) If the number of scores is even: 
Use formula: 


Median=Md=Mid-point between the two middle scores. 


= Mid-point between 2. th and (5- "i је Score 


i Scores 
6 3 Scores above 
5 1 
4 | 


6 
Median 3.5 9R5-—3 Middle scores will be 3 and 4. 
3 | 
2 | ок =" =3.5 
Ж 
1 3 Scores below 


The median is useful and is used widely in classroom work 
because EE 


=> 


Basic Statistical Concepts in Measurement ~ 73 


(a) it is easily found by just ordering the scores, 
(b) it divides total scores into two halves which depict the 
average and examinees can understand it better, 
(c) it is not influenced by extreme scores at either end of the 
distribution and becomes a more realistic measure. 


Example 
Scores Scores 
5 24 
4 5 
3—Median Median—3 
2 2 


Та this case Mean is 3 in one and 7 іп the second whereas 
median is 3 in both the cases. 


(iii) Mode (MO) 

Mode is that score which occurs most frequently in a distribu- 
tion. It is located by inspection rather than by comoutation. In 
grouped data, the mode is assumed to be the mid-score of the 
interval in which the greatest frequency occurs. 


Ungrouped data Grouped data 


Scores Interval Frequency 

8 : 19—20 8 

gi 17—18 v 

5) | 

5 > mode 15—16 15—Mid-score of 

sf (15—16)=15.5 
(Моде) 

4 13—14 4 

3 11—12 3 

Examples 


— Modal age of 5th class students is 11 years 
— Modal size of gents’ shirts is 38 cm. 


This means that there are more 11 year-old students in class 
V than of any other age. Likewise, there are more people with 
38 cms' collar size than any other size. 

Distribution may be bimodal or multimodal. For example, 


74 Handbook of Pupil Evaluation 


the number of students travelling by D.T.C. on working days 
may be more between 8 and 9 A.M. and then between 5 and 6 
P.M. thereby giving a bimodal distribution. 


6.2. Measures of Dispersion or Spread 


‘A measure of dispersion or spread is an index that describes 
the extent to which scores are spread out from the average 
score. In other words, it indicates the heterogeniety or homo- 
geniety of the scores and provides a basis for comparison of 
different collections of scores. 


(i) Range 
It is the simplest measure of dispersion and is simply the 
difference between the highest and the lowest score. Although it 
is useful for making simple interpretation, it is unreliable and 
misleading as it is based on two most atypical scores. 

Examine the following data showing scores of two groups 
on the same test. 


Group I Group II 
20 20 

5) 16 

4 11 

3 7 

2 3 

1 1 

Range—19 Range=1 


Although the range is the same in both, the scores in group I 
are quite close while they spread across the entire range in 
group II. 


(ii) Deviation from the Mean (Small x) 
A deviation score is the score expressed as its distance from the 
mean. 


x=(X-M) where X=Score; M=Mean. 


Deviation Score is positive (+) if the Score (X) falls above 
the mean (M). 


ђ ~ 


Basic Statistical Concepts in Measurement 75 


Deviation Score is negative (—) if the Scores (x) fall below 
the mean (M). 


Example 
Pupil Score Grade Pupil Score Grade 
Amita 10 A Lalita 9 B 
Anita 9 B Manjeeta 8 c 
Babita 8 с Ranjeeta 8 С 
Kavita 7 D Sarita 8 С 
Gita 6 F Vanita 7 D 
УХ=40 =xX=40 
N= N= 5 
40 40 
Me-— =8 М= us =8 
Ма=8 Md-8 


Mean and median are the same, yet Group 1 іѕ compared to 
heterogeneous Group II which is more homogeneous. 

Using the above scoresto compare the two groups let us 
find out the deviation scores. 


Group I Group II 

Score x i.e. (X—M) Score x i.c. (X—M) 

10 +2 9 +1 
9 +1 8 4:0 
8 0 8 =0 
7. =l 8 —0 
6 —2 Т. = 

3X=40 X=0 2X=40 X=0 

N= 5 N= 5 

M= 8 M= 8 


Thus sum of scores’ deviations from the mean equals zero: 
(X—M)=0 x=0 
Therefore, mean is that value in a distribution about which the 


sum of the deviation scores equals zero. 


(iti) Variance (в?) 
Variance is the sum of the squared deviations from the mean 
divided by N. If we square each deviation from the means, i,e. 


76 Handbook of Pupil Evaluation 


elo, 22: 

We get a positive score. These scores can be summed up 
and divided by N and the mean of the squared deviations 
computed. 


The Variance— 


where х=(Х-М,) 
and Mz=Mean of X scores (4) 


Thus variance is indicative of how all scores in a distribution 
are distributed or dispersed about the mean. This concept is 
employed in a number of statistical interpretation of data. 


E(X-M.) 2x 
E Е 


(iv) Quartile deviation 

Since extreme scores affect the size of the range it is difficult to 
interpret unless these extreme scores are neglected. This is done 
in quartile deviation. If range is computed over the middle only 
instead' of the whole range we can avoid the cffect of extreme 
Scores on our results. Quartiles like the median, are points on 


the continuum and may not always fall at points occupied by 
scores. 


Example 
Middle 50% of scores 


Distribution—A: 5, 68 9 9 9 Wt 12 14 15 17 20 
First Third 
Quartile Quartile 

Distribution—B: 5,5,5 6 6 6 7 7 9 10 10 10 

(a) There are 12 Scores in each distribution. 

(b) Isolate 50% of scores (i.e. 6) in the middle. 

(c) First quartile oc Он 15 the point which separates the first 
quarter from the middle і.е. the point below which lie 2597 
of the scores. ` 

(d) Third.quartile or Qs is the point which separates the third 
quarter from the middle i.e. the point below which lie 75% 
of the scores. 


(e) Third quartile (Оз) in Distribution—A falls between 14 
and 15 i.e. 74.5. 


(f) First quartile (Qi) in Distribution—A falls between 8 and 
91е.8,5. - 


(e) Interquartile range— Q3— Q; ок 14.5— 8.5— 6.0. 


Basic Statistical Concepts in Measurement 71 


(h) Semi-quartile ranges: S =3. It is also called Quartile 


Deviation. 
Therefore, а — ё 
. CEPS a 4 | > — O6. 
Quartile deviation OM анды === <= === 3.0 
(i) For Distribution—B 
Quartile deviation Q= 23793. – 4. „у, 


Thus the difference in the quartile deviation of Distribu- 
tion—A i.c. 3.0 and Distribution—B i.e. 2.0 gives a clearer 


picture now of the difference in the Spread of scores of the two 
distributions. 


(v) Standard deviation (c) (ungrouped data) 

Standard deviation is based on the difference between each 
Score and the mean. The larger the differences are, the more 
spread out the scores are and the larger will the standard devia- 


tion be. It is the positive Square root of the variance as shown 
in the formula below: 


INT Mum ————— — —— 
Sum of (each score — the mean) and squared 
S.D. ~ ~ 
Number of scores 


OR -JZ (5) 


Example 
Score x х2 
42 +2 +4 4 
41 + 1 1 
40 0 
39 =] +1 
38 — 2 +4 
ЕН EN. d. 
=x?=10 


; 10 10 
Variance = -5 72 Standard deviation= 10 1.414. 


z= 
Simple formula for computing S.D, 
The easiest formula which 


approximates to the above formula 
and can be used by every 


classroom teacher is given here: 


78 Handbook of Pupi) Evaluation 


Sum of the highest —Sum of the lowest 
1/6 of scores 1/6 of scores (6) 


S.D.— TOI 


where 
N is the number of scores in a group. 


Example 

Scores x x? 
13 + 6 36 
12 + 5 25 
11 +4 16 
10 +3 9 
9 + 2 4 
8 +1 1 
7 — 0 0 
6 — 1 1 
5 — 2 4 
4 —3 9 
3 — 4 16 
2 — 5 25 
1 — 6 36 

х = 182 


Using Formula 
Sx? a as 
NEN We get o= d- 4-30 


Use of a simpler formula for calculating standard deviation: 
1/6 H—1/6 L 
DIDI — 
1 (Ч—1) 
Total scores are 13. The 1/6th scores come upto two top 
and two bottom scores i.e. 134-12 and 14-2 
зар _ 43-+12)—(1+2) _ 25-3 EN 


For most of the classroom tests this estimate is sufficiently 
accurate and quite easy to calculate SD by this formula. 

There are some important uses of standard deviation. 

It can be used: 


Basic Statistical Concepts in Measurement 79 


(a) to describe the relative homogeniety of the two classes to 
whom same or comparable tests are given, 

(b) to determine the composite score based on addition of two 
or more scores, e.g. on two or more unit tests. The unit 
test with larger S.D. determines a student's standing in a 
class on the composite score obtained, and 

(c) to derive comparable scores for a given class on two or 
more tests thereby helping the teachers to describe the 
relative quality of performance of students on different 
tests in relation to the mean of the distribution. 


Example 
Test I Test II 
Mean=30 Mean=40 
S.D; = 5 S.D. = 8 


A student’s score of 35 on Test I will be at the same relative 
level on Test II if he gets 40+8=48 because 35 is one S.D. 
above the mean (30) as 48 is 1. S.D. above the mean (40) of 
second test. —— 


The scores expressed as S.D. distances from a mean are 
called Z scores. 


7. Derived Scores 


Raw scores cannot be used to compare an individual's perfor- 
mance with a group on the same test or ona different test 
because the unit on one test may not indicate the same incre- 
ment of performance as the same unit on another test. For this 
теазоп raw scores are converted into derived scores which can 
be interpreted more easily and are more comparable from one 
test to another and from one group to another. 


Types of derived scores 


7.1 Standard Scores 


These scores use mean 
Score, expressed as the 
from the mean. 

TT Raw scores—Mean 


йе S.D. (7) 


and S.D. z score is one type of standard 
distance in S.D. units the score differs 


80 Handbook of Pupil Evaluation 


Example 

Tests Students Score Z score 

Test I A 45 55-39-30 
Mean=30 B 30 2030060 
5.0.5 с 25 25-90. _ 10 
Test It A 50 5040.10 
Mean=40 B 40 100.00 
5.р.=10 с 35 10. 05 


Thus student A performed much better (3.0) on Test I than 
on Test II (1.0). The performance of student B is same on both 
while student C fairs a little better оп Test II (—.5) than on 
Test I (—1.0). Z scores are generally converted by teachers to a 
scale using whole numbers and avoid negative numbers. One of 
the commonly used scales is the Z score with a mean of 50 and 
S.D. of 10 which can be obtained from Z scores by the formula, 

Z=10Z+50 


For our preceding example the Z scores are as follows: 


(A) 2=10х3.04-50=80 
(B 2=10х0 +50=50 
(С) Z=10x—1+50=40 
(A) | 7=10х1 +50=60 


(B 2=10х0 +50=50 
(С) 2=10х—0.5+50=45 


In both types of computations, the Z scores lead to the same 
conclusion, differing in their units as under: 


(Small) Z score Mean= 0 SD.— 1 
(Capital) Z score Mean —50 S.D.—10 


Basic Statistical Concepts in Measurement 81 
7.2 Percentile Rank 


A percentile rank of a student indicates the percentage of 
students whose scores are lower than his. If 8025 of students 
score lower than Amita then Amita's percentile rank is 80. 
Percentile rank (PR) can be calculated by dividing the number 
of lower scores by the total number in the class and multiplying 
the result by 100. 


PR= Number of lower scores 


- х 100 (8) 


Total number in the class 
N=40 
Example 
Score No. of lower scores PR 
39 " 
45 39 7 x 100=98 
4 20 20s 100=50 
0 2 A =5 
10 ~ 
35 10 —4o Х100=25 
30 4 е х100=10 


40 


Students can understand percentile rank easily апа teachers 
can report test results meaningfully. Standard scores are more 
useful for recording unit-wise test results or semester-wise results. 
In fact both methods are complementary. Standard scores 
describe how far a student's score is from the average while the 
PR describe his relative position in the class. 


Median is the 50th PR because 50% of scores fall below it. 
When Nis small, the definition of PR may have to be refined. 
Percentile rank is the score in the distribution below which a 
given percentage of scores fall "plus one half the percentage of 
space occupied by the given score". This can be understood by 
the example given below: 

Scores: 60, 67, 53, 49, 40 


82 Handbook of Pupil Evaluation 


In this case 53 is apparently the median i.e. it occupies 50th 
PR. Thus 50% of scores should fall below it but only two (49 
and 40) out of 5 fall below it. This indicates that 53 has a PR 
of 40 (not 50) because 2 out of 5 scores fall below it. However 
if we consider the addition in the definition, “plus one half the 
percentage of space occupied by the given score" we can recon- 
cile as explained below: 

Each score (5 in all) occupies 20% of total space. 

One half of the percentage (2077) — 107. 

Below 53 lie 40% of the (2 out of 5). 

4097 4- 1097 4- :07Z, which is the true percentile rank. 

When N is large this qualification is unimportant because 
the percentile ranks are rounded to the nearest whole number 
ranging from PR of 99 to that of zero. Percentile rank can be 
computed by the formula: 


(100 R — 50) 
N 


 PR-100- (9) 


where R is the rank from the top. 


Example 

- Roomi ranks 37 in his 10th class of 150 students. Therefore, 
36 ranks fall above him and 113 below him. His percentile rank 
will be 


_ (3700—50) 
150 


PR=100 =100—24= 76 (10) 


7.3. Percentile Rank and Percentile Score 


A percentile is a score point in the score distribution below 
which the stated percentage of all measures lies. A student 
who scores 30th percentile of his class has done better than 
30 per cent of students and is poorer than 70 per cent. It is not 
to be confused with the percentage of the items or questions 
correctly answered. Percentile rank on the other hand is a point 
in the distribution below which a given percentage of scores 
fall. If the 80th percentile rank is a score of 65 then it means 
that 80 per cent of the scores fall below 65. Likewise in the first 
example, PR of student is 30. 


и Баи 


Basic Statistical Concepts in Mecsurement 83 
8. Measures of Relationships 


8.1. Correlation 

Relationships between age and size, intelligence quotient and 

achievement, height and weight, car size and fuel consumption, 

reading test scores and tests on literature and such other umpteen 

examples are quite common in everyday life. This type of - 
relationship has important applications in measurement. The 

concept correlation indicates the relationship between two or 

more paired variables or between two or more sets of data. 

The degree of relationship is measured and presented by the 

coefficient of correlation. Measures of correlation are therefore: 


(a) numerical indexes that reflect quantitatively the extent of 
relationship between two sets of scores or measures, and 

(b) the extent to which examinees hold the same relative posi- 
tion on the two sets of measures. 


When the relation is perfect and direct, numerical index is 
1.00. 

When there is no relationship existing, the numerical index 
is 0.00. 

When relationship is perfect but inverse the numerical index 
is — 1.00. 

Most of the measures of interest in the classroom will be 
between 0.00 and 1.00. 


(—1.00) (0.00) (4-1.00) 
High in one trait low Pure chance relation- High in one trait, high 
in the other Ship. Lack of correla- іп the other 
tion 
(а) Age of automobile ^ (а) Body weight and (a) Intelligence and 
and sale value intelligence academic achieve- 
ment 
(b) Time srent in (b) Achievement and (b) Height and shoe 
practice and number height size 
of spelling errors 
(c) Total wheat produc. (c) Interest and 
tion and price per achievement 
quintal 


(d) Productivity per 
acre ard value of 
farmland 


SSE 


- 


124 Handbook of Pupil Evaluation 


Content Operations Products 
l. Figural 1. Cognition 1. Units 
2. Symbolic 2. Memory 2. Classes 
3. Semantic 3. Divergent thinking 3. Relations 
:4. Behavioural 4. Convergent thinking 4. Systems 
5. Evaluation 5. Transformations 


6. Implications 


Therefore 4x 5x6, makes 120 cells or dimensions, each 
representing a mental process or an ability. Out of these more 
than 100 abilities have already been identified. 


S. Cognitive Domain Taxonomies 


(i) Bloom's Taxonomy (1956) 
The first ever attempt to build a taxonomy was made by a 
group of college examiners who were confronted with assess- 
ment problems. They recognised the need to classify educational 
objectives into three categories, called the domains. The cogni- 
tive domain included recall and recognition of. knowledge and 
‘development of intellectual skills. The ‘affective’ domain in- 
cluded objectives relating to interest, attitudes and values while 
the psychomotor domain includes observed voluntary action or 
action patterns. 

Bloom's taxonomy (1956) is based on four basic postulates 
(8): 

(a) Behaviours designated in the taxonomy are cognitive. 

(b) Behaviours are hierarchical in nature. 

(c) Behaviours are cumulative in nature. 

(d) Behaviours are learned behaviours. 


Taxonomy is arranged into six objectives ranging from the 
‘simplest to the most complex. Each objective is further defined 
into sub-categories followed by illustrative items. Taxonomy is 
cas under: | 


1.0 Knowledge 1.1 Knowledge of specifics 3 
1.2 Knowledge of ways and means 
dealing with specifics 


Instructional Objectives and Evaluation 125: 


1.3 Knowledge of the universals and: 
abstractions in a field 
2.0 Comprehension 2.1 Translation 
2.2 Interpretation 
2.3 Extrapolation 
3.0 Application 3.1 Ability to use a theory, principle or- 
` a method to solve a problem invol- 
ving a new or unfamiliar situation 
4.0 Analysis 4.1 Analysis of elements 
4.2 Analysis of relationships 
4.3 Analysis of organisational principles 
5.0 Synthesis 5.1 Production of unique communica- 
tion 
5.2 Production ofa plan or proposed: 
set of operations 
5.3 Derivation of a set of abstract. 


relations 

6.0 Evaluation 6.1 Judgements in terms of internal. 
criteria 

6.2 Judgements in terms of external 
criteria 


(ii) Gagne-Merrills! Taxonomy (1965) 

This taxonomy integrates cognitive, affective and E MED 
domains. It follows the ‘push down’ principle which emphasises. 
the acquisition of behaviour at the lower level before acquiring: 
it at the higher level. Eight types of hierarchical behaviours. ~ 
are identified, the learning of one being conditioned by the 
learning of those which are inferior in the structure. From the. 
easiest to the most complex these are as under (9): 


1. Signal learning 5. Discrimination learning. 
2. Stimulus response learning 6. Concept learning 

3. Chaining 7. Rule learning 

4. Verbal association 8. Problem solving 


(ii) Madaus (1973) 
While working on the linear hierarchical structure of Bloom’: s. 


taxonomy, Madaus found a branching structure after the leve] 


84 Handbook of Pupil Evaluation 


Rarely do we find perfect correlation, particularly in relation 
to human traits. In perfect positive correlation, for every unit 
increase in one variable there is a proportional unit increase in 
the other. | 

* Like-wise in a perfect negative correlation, for every unit 
decrease in one variable there is proportional unit decrease in 
the other. Thus when two sets of scores correlate, the scores 
co-vary. In reality, coefficient of +1.00 or —1.00 are not 
encountered in human traits. The sign of coefficient indicates 
the direction of the relationship and the numerical value is its 


strength. 


Example 
We may examine the following scores on fou 


‚ stand the concept of correlation. Figures 1:151. 
sent the data given іп, Table 2. 


r tests to under- 
2 and 1.3 repre- 


TABLE2 | 
Test D 


Student Test A Test B Test C 
1 5 16 28 26 
2 10 18 26 18 
3 15 20 24 24 
*4 *20 322 22 22 
5 25 24 20 28 
16 


35 28 . 16 


let 


Basic Statistical Concepts in Measurement 85 


70 e ~ 
24 
1 60 
^ 8? 
< 40} e ч 
ы 30 | o 
20 - e 
0 ө 
ји 
18 20 22 24 26 28 30 
Test-D 
Fig. 1.3 
1. Test À and Test B (a) For every 5-point rise in Test 
(Perfect correlation) A there is 2 point rise in 
Positive test B 
Fig. 1.1 (b) Dots lie on the straight line 
(c) Straight line rises diagonally 
from left to right 
2. Test A and Test C (a) Dots lie on the straight line 
(Perfect correlation) (b) Straight line falls diagonally 
Negative from left to right 
Fig. 1.2 
3. Test A and Test D (a) No covariance among the 
(Zero correlation) Scores 
Fig. 1.3 
5 The degree of relationship between two variables is also depict- 


ed]by developing scatter-plots from the given data. Following 
scatter-plots illustrate this. (10) 


86 


Handbook of Pupil Evaluation 


2 tiet segane 


| en, voneiatien 


ам 
corretas 


2.7 Zero correlation 


2.8 Zero correlation 1 


$ 


Basic Statistical Concepts in Measuremeit 87 


From the above discussion we may understand that: 


(a) A correlation coefficient indicates only linear relationship 
(or lack of it). There can be Zero correlation between two 
variables that are related in the same way other than the 
linear fashion. 

(b) The coefficient of correlation is only an estimate of corre- 
lation that actually exists. 

(c) Sign (+ or —) has nothing to do with the strength of 
correlation which indeed is reflected by the size of correla- 
tion coefficient. A correlation of +-.57 and —.57 are of 
exactly the same strength, the difference lies in the direc- 
tion i.e. negative (inverse) or positive (direct). 

(d) Correlation between two sets or variables does not imply 
cause and effect. For example high correlation between the 
farm output and the consumption of alcohol does not 
imply cause effect relationship. Increase in economy would 
cause increase in both. 


How to compute correlation estimate? 
There are a number of methods to compute correlation. 


(i) Scatter plot method 

The easiest method is to construct scatter plots as shown.in 
Figs. 2.1 to 2.8 and obtain the rough estimate of the size and 
direction of correlation. Scattered dots indicate low correlation 
and narrow bands show high correlations. 


(ii) Diedrich (1964) Tetrachoric correlation method (11) 


Steps | ; 
1. Rank order the students on the basis of scores on Testis 


2. Draw a line through the scores at the median. 

3. Repeat Steps | and 2 for Test 2. 

4. Find the total number of students from both the lists, whose 
names appear in the top half of both lists. 

5. Divide the number you get in Step 4 by the total number of 
students taking both tests (i.e. to find percentage of students 
scoring in the top half on both tests). 

6. Locate this percentage (Step 5) in Table 3 given on the next 
page and note the corresponding correlation coefficient. 


Handbook of Pupil Evaluation 
E 
| 


9 

16:== i 

88'— 8 

r 6 
p= or 
Le It 
Ашы TI 
69 — £I 
1 1102 
-13d 


"(9£-p£ 'ад) SIS2 ү грош sayovay 40f оџ51015 MI JOYS “YIPƏAQ :221105 


TN " 6r— сс те of eL st 
09'— el £r— єє ie Те 3 ФЕ 
Sg 91 10— vC ep ТЕ Ig op 
or LI 00" ст or 55 em Га 
= 8I L0 9c Se vE 88" c 
Er б £r Lc 09° se 16 ЕР 
i oz 6r 8с c9 9E £6 b 
sc— Ic Sc 6c 69" LE s6 sy 
1 за 1 E 1 1002 1 Jua 
-d =d әд -13d 


88 


suone[o1102 опоцәрюдуә ү, 16 ялау, 


4 
n 


Basic Statistical Concepts in Measurement 89 


(iii) Pearsons’ product moment method (r) 
For most of the classroom work the two methods described 
above are accurate enough. If a calculator is available Pearsons' 
product moment correlation can be computed without much 
effort by using the following formula which is not difficult to 
use: 
МЕху— (£x) (£y) 
= /INXX3— (EX? 2—_(>ү? 12) 
М{М>х?—(®х?)] [NZy?— (2y?)] 

Although the formula appears quite cumbersome, it is not so 
as you need only six kinds of information. These are: 


. Ex ie. the sum of raw scores on Test 1. 

. Ex? i.e. the sum of the squared scores on Test la 

. Xy i.e. the sum of the raw scores on Test 2. 

‚ Sy? ie. the sum of the squared scores on Test 2. 

. Exy i.e. the sum of the cross products of the raw scores of 
Test 1 and Test 2. 

.N i.e. the number of students (not the scores). 


од о TIO — 


an 


(iv) Spearman's rank order method (¢) 
Spearman rank order coeflicient of correlation is also simple to 


compute by using the formula: 
6702 
e (Rho)--1— N(N2—1) (13) 
where D=the difference between paired ranks | 
®р?=={һе sum of squared differences between ranks 
N number of paired ranks 


Example 
Two judges Rao and Kamla ranke 


contest as under: 


d 10 pupils in a debate 


To what extent are their judgements in agreement? 


о 
© 


Handbook of Pupil Evaluation 


Pupil Rao's ranking Kamla's ranking D D? 
А 5 4 1 1 
в 2 1 1 1 
(S 1 3 2 =: 
р 10 7 3 9 
E 8 10 2 4 
F 9 8 1 1 
G 6 5 1 1 
н 4 6 2 4 
I 1 9 2 4 
1 3 2 1 1 
N=10 =p2=30 
чүй —0:(30) i 180 
10 (100-1) — 10 (99) 
180 2 
Mori 9 
This means high degree of agreement between the two 
judges. 


(у) Wolfes’ (adaptation) method ( 14) 
When a teacher is interested in finding the correlation between 
the scores of Students on two unit tests it can be calculated as 


Under. Examine the example given below Showing scores of 20 
Students on unit Test 1 and Test 2 showing marks out of 25, 


TABLE 4 
Pupil Scores Classification 
Unit test 1 Unit test 2 Unit test 1 Unit test 2 


= 
N 
w 
A 
un 


A "M *20 H H 
B 20 11 = = 
с 10 13 — = 
D 7* 14 т; — 


(Contd.) 


Basic Statistical Concepts in Measurement 91 


1 2 3 4 5 

E 15 15 — — 
F *22 *17 H H 

G 13 10 — — 
H 14 *19 — H 
1 7* 5* L D 

J 17 *16 – н 

K *23 16 H — 
p 19 13 — — 
M 12 11 = = 
N 321 *18 H H 
о бе s 8* L L 
P 9* 17 L H 
Q 5* 8* L L 
R 17 16 — — 
S „24 6* ЭСН L 
р 18 8* — ү, 


How to proceed? 

Step 1: Decide how many students to include in high and low 
group. In usual classes of 20 to 30 it is alright to have one 
fourth ie. 5 scores in this example; (21, 22, 23, 24, 24) and 
(5, 6, 7, 7, 9) in Test 1 and (17, 17, 18, 19, 20) and (5, 6, 785 
8) in Test 2. 

Step 2: Cross mark on the left the five high and on the right 
the five low scores in each test. For example high scores, 24, 24, 
23, 22, 21 and low scores 5, 6, 7, 7, 9 in Test I and high scores, 
20, 19, 18, 17, 17 and low scores, 5;6, 8, 8, 8 in Test II are 
identified. 

Step 3: Under columns 4 and 5 write the letter Н for the 
scores that are among the highest 5 on each test i.e, against the 
cross marks put (on the left) under respective columns. 

Step 4: Under columns 4 and 5 write the letter L for the 
scores that are among the lowest five on each unit test, i.e. against 
the cross marks (put on the right) under respective columns. 

Step 5: Prepare a table like the following indicating the posi- 
tion by tallies or symbols, of Н and L in each of the four 
squares. 


92 Handbook of Pupil Evaluation 


Test-2 


In this checker board 3 tallies in HH are of students A, F 
and N. : 

In LL three tallies are of 1, O and Q. In HL it is of S and in 
LH it is of P. 

Step 6: Calculate the coefficient (т) of correlation by the 

— f HL and LH 

formula: r= (Sum of HH and EL)— Gum о ап ) (15) 
when п is the number of Н or L group (5 in this case) i.e. 


+ of 20. 


—_3+3)—(1+1) 6-2 4 
Thus r= 2x5 — = 15794 


Understandings 

1. Coefficient of correlation Show the extent to which high and 
low scores on the test are associated with second test. The 
closer the index to + 1 the stronger the relationship. 

2. When r is more than = .90 the relationship is quite strong, 


.60. 
3. Correlation coefficient does not reflect proportion of a 


perfect relationship (+ 1.0) i.e. r of .75 is not i of + 1.0. 


4. Correlation coefficients do not Show proportion of high 
performers on the test who will Score high on the second. 

5. Correlation is only a. description of general tendencies e.g. 
When r—.60 between Test I and Test II it indicates that there 


Basic Statistical Concepts in Measurement 93 


is tendency for students who score high on one test to score 
high on the other. 


9. Normal Probability Curve 


It is a bell shaped curve which is unimodal and symmetrical. 
It is unimodal because scores cluster round a single point. It is 
symmetrical because ifthe left half portion is folded over the 
right half portion at the mean there would be an exact fit or the 
mirror image of cach other, as shown in the diagram below. 


Do the "tail" touch the base line? Does it continue to 
approach more closely as one moves further from the mean? 


Vertical 
axis 


Mean 
Median 
Mode 


A normal curve represents the way many human characteris- 
tics аге distributed. A normal curve is а mathematical model. 
Since this curve is precisely defined, we can compute the scores 
falling under any area of the curve. We have already learnt that 
in a normal distribution, mean, median and mode lie at the 
same point i.e. they are identical in values. 

If we divide the curve by marking to the left and right the 
units of one standard deviation the normal distribution spans 
approximately 3 standard deviation on either side. There are 
very few scores below and above the mean beyond 3 units as 
seen in the diagram. Curve being symmetrical, area to the 


Mean 


94 Handbook of Pupil Evaluation 


right of the mean is equal to the area left of the mean. Thus 
the area between mean and 1c above and below the mean is the 
same. The height of the curve at a given point indicates the 
frequency of scores at that point. Therefore, there are as many 
scores that are one standard deviation above the mean as there 
are one standard deviation below the mean. 

The area of the curve between any two points represents the 


number of frequency of scores falling between those two points 
in а normal distribution. The scores falling: 


(a) between the mean and 15 above the mean==34,13% 
(b) between the mean and 1c below the mean= 34.13% 

(c) between mean — 1с to mean + 1с = 68.26% 
(d) beyond + Іс and — 1с cover an area =} of scores 
(e) below the mean — 15=1/6 of the total scores 

(f) above the mean + 15=1/6 of the total scores. 


This curve is so precisely defined that we can compute the 
Proportion of scores falling under any area of the curve as 
illustrated in the following figure as depicted by TenBrink (16). 


Percent of cases under 
portions of the normal 
curve 


Standard deviations 
Percentiles 


Z-scores 


T-Scores. EN 0 ) 1 i 
ИЛЛЕ: ҮТЕ Л 90 

CEEB Scores Aer uer DE THEM MS t NE 
200 39) 400, 500 600 700 800 1 900 
1 i 

: SIS is] sisi [slo | | 
Stanines | 
T 

П 1 


Percent in stanincs 


4 7 [12|17] 20 121 7 4 


(Серу trom P460 of Teubrink) 


Basic Statistical Concepts in Measurement 95 


Properties of Normal Curve 
1. With the help of equation of the normal probability curve 


N A —x? 
LL = 2с? 


When N and c are known it is possible to compute 


(a) the frequency (or Y) of a given value x and 
(b) the number of percentage between two points or above or 
below a given point in the distribution. 


In this equation: 


Y «Scores along the base line or X-axis 

y=height of the curve above the X-axis i.e. frequency of a 
given x-value 

N=Number of cases 


mis ari q 
5 = Standard deviation ‘Constants 
= —3.1416 
е =2.7183 ZEN 


2. Total area under the curve is taken arbitrarily to be of 10,000 
cases. The curve may be taken to end at points —3c and +30 
distant from the mean, and covers 9273 cases ог 99.73% of 
the entire distribution and disregards .27 or 1% of distribu- 
tion. 

3. Mean, median and mode—all fall exactly at the mid-point 
and are numerically equal. - 


10. Reliability 


The accuracy or precision with which a test measures whatever 
it measures is called reliability. There is no test which may be 
completely reliable (1.0). There is always some error of mea- 
surement. So the true score is always contaminated with the 
error component. The observed score, therefore, consists of true 
Score and error component. 

X=T+E 
where X=raw score 

Е =етгог score (+ve or — ve) 


96 Handbook of Pupil Evaluation 


T and E are never known for any student. They are inferred 
theoretically on the basis of test taking behaviour. 

Error of measurement is the result of the use of imperfect 
instruments of evaluation and not the mistakes that creep in as 


a result of handling test scores. Errors of measurement may be 
due to: 


(a) sample of items used in the test, 


(b) variation in physical health, and external conditions of the 
examinee, 


(c) chance or luck in guessing, 
(d) Scoring in case of essay type questions. 


Therefore, the test, the Conditions and the examinee are the 
three primary sources of the errors that creep in. Students’ 
Score differ due to individual differences in their traits which the 
test measures and also due to errors of measurement, The 
degree to which Scores differ are described by S.D. of the test 
but square of the S.D. ie. variance is more useful to depict the 
relative contribution of the true scores and error Scores to the 
apparant differences among students’ performances on a test, 
Variance of Observed scores сап be defined as the sum of 
the variance of the true score and the variance of error scores 
аѕ expressed below: 
Vo Vi V, 

where Vo- variance of Observed scores 
Vi variance of true score 
V,— variance of error scores 


У. will be zero on an ideal test and 50 по. error of measure- 
ment. Since reliability coefficient is an indicator of the accuracy 
of scores on a test, we may define coefficient of reliability as: 


where ги = coefficient of reliability 
Vo= variance of observed Scores (5,1),2) 
У: = variance of error scores 


From this we can note that Coefficient of reliability is the 


Basic Statistical Concepts in Measurement 97 


proportion of observed score variance which is due to true score 

- Variance. The higher this proportion, the more reliable the test 
would be. However, reliability coefficient cannot be computed 
from the above formula. 


10.1 Methods of Computing Reliability Coefficients 
Following methods can be applied by the classroom teachers: 


(i) Split-half 

(a) A test is divided into two halves, usually odd numbered 
items versus even numbered items. 

(b) The two halves are scored separately for each student. 

(c) The correlation Coefficient is computed between the scores 
on the two halves. 

(d) The estimate of reliability coéfficient is obtained by using 
the Spearman-Brown Formula: 

ET ; 

Ln (16) 

where Ги —reliability coefficient of the total test 

T33—coefficient of correlation between two half-length tests 


Ttt— 


Example 

Suppose your split-half correlation is 
" " 2x.70 1.40 

efficient will be= rm 1.70 —.82 
Split-half reliability can be used for essa 


pli Y type tests also by 
obtaining separate scores on odd and even questions e.g. 1, 3 5 
and 2, 4, 6. "en 


-70. Its reliability co- 


(ii) Test-retest 
(a) A test is given and its Scores are recorded. 
(b) Same test is given again to the same Students at another 
(O тесла рога few day 
-Ie- abili: i 
On first testing and ie си жш, FS E 
testing. i m 


(iii) Kuder-Richardson Jormula-20 


In case of objective type Questions, where each item jg right 
gn 


98 ` Handbook of Pupil Evaluation 


or wrong i.e. 1 or 0 mark, this formula can be used to estimate 
the internal consistency. The formula used is: 


cal c= 8) 52 


where r:,— reliability of the total test 
n —number of items in the test 
t —S.D. of the test scores А 
"р =proportion of the group answering a test item 
correctly 


q —proportion of the group answering a test item 
incorrectly (1— p). 


(iv) Simplified Kuder-Richardson formula-21 


where == mean 
K=number of items in the test 
SD'— variance (standard deviation squared) 


Example 
If X=40, S.D.—5, K=60 
ry=1—40(60-40) _, 40х20 
6015): 60x25 
ВЕ ЗОО 8 т 
l-7500 717157 15 7-41 


(v) Saupe method 
LL. Saupe has given the followin 
simpler for approximating Kuder- 
. 19K 
(S.D.F ш) 

where ru=reliability coefficient 

SD? = variance 

K=number of items in the test 
.19 = constant 


5 formula which is even 
Richardson results. 


ти= — 


Basic Statistical Concepts in Measurement 99 


Example 
Suppose K=60, S.D.—5 
0.19 x 60 11.40 


ти=1— 25 =1— 25 =1—.45=.55 


Since different methods involve slightly different assumptions 
about the nature of data and reliability there would be some 
variation in the results obtained from different methods. Saupe's 
formula yields lower estimates than the Split-half method 
although the two results do not differ appreciably. 

Saupe's formula is neither suitable for tests which are not 
scored on 1—0 basis for correct and incorrect answer nor for 
those which are very easy or very difficult. Split-half method 
is suitable for those tests which assign equal number of marks 
or points for each correct answer as in case of multiple choice 
tests and attitude scales. Neither of the two methods is suited 
for essay tests and where speed is a primary factor. 

Reliability is only an estimate and should be interpreted 


quite.cautiously. It refers to the group tested. The same test if 


used for a different group will have different connotation as far 
‘as interpretation is concerned. The reliability coefficient of a 
test will bs much higher with a group whose mean score 15872 
as compared to the one with mean score of 35. Factors affect- 
ing reliability in case of objective type, short answer and essay 
type will be discussed in more detail in the chapter dealing with 
the qualities of a good test. 


11. Standard Error of Measurement 


Reliability and standard error of measurement агг inter-related 
concepts. Reliability indicates freedom from error while 
standard error of measurement estimates the amount of error 
that does exist. While reliability is used as an estimate of the 
accuracy of the results of measurement as a whole, the standard 
error of measurement is used as an estimate of the accuracy of 
the result for a given individual. Measurement error occurs 
when the observed score of a student differs from his true scores 
as we learnt earlier X=T-+E. Variance of the observed score 


100 Handbook of Pupil Evaluatiorz 


(Vo) is the sum of variance of true score (Vi) and variance of 
error (V.) . 
Thus Vo Vir Ve. 

For any one student, true score and error score cannot be 
known directly but standard deviation of the error scores Сап 
be estimated on the basis of which individual performance сап 
be interpreted. This standard deviation of the error of measure- 
ment is called the standard error of measurement (SEM). It 
can be estimated by using the following formula: 


G) SEM=SD 4/1—ri; (19) 
where SEM=standard error of measurement (also called 
error of an obtained score) 
SD=standard deviation of the test scores 
ru-reliability coefficient of the test 


Example 
Tn our earlier example SD=S and ги =0.55 
SEM-SD \/1—г‹ 

or =5V1—.55=5 4/.45—5x.61—3.35 
This SEM of 3.5 means that standard deviation of the errors. 
of measurement on this test is 3.5. This means that the average 
distance between the observed scores and the true score is 
estimated to be 3.5. Thus if Anjum's obtained score is 58, 
what will be her true score? Assuming that we did not commit 
greater errors in her score, Anjum’s true score is no more than 
3.5 points (plus-minus) from her observed score of 58. This- 
would mean that her true score lies somewhere between 61.5 
and 54.5 (582-3.5— 54.5 to 61.5). ; 


Lord's Method of Calculating SEM 
If the test is neither extremely easy nor extremely difficult and. 
admits only one correct answer for each item thereby allowing. 
for 1 or zero mark for each correct/incorrect Tesponse, Lord’s. 
method of calculating standard error of measurement can be 
used and is quite simple. 

SEM— V/0.432K (20), 
where K —number of item 

0.432— a constant 


Basic Statistical Concepts in Measurement 101 


Thus for a test of 60 items 
SEM=0.432 ү/60=.432х7.75=3.35 
This value is almost the same as calculated by the Ist 


method. 
Readymade tables are available to estimate the SEM 


directly from the number of items (K). A few SEM figures 
corresponding to item number are given below for ready 


f reference. 


Item no. SEM Item no. SEM Item no. SEM 
(K) (K) (K) 

10 1.4 45 2.9 75 3.7 
15 1:7 50 3.1 80 3.9 
20 1.9 55 3.2 85 4.0 
25 22 60 3.4 90 41 
30 2.6 65 3.5 95 42 
40 2 70 3.6 100 Я 4.3 


Errors of measurement are normally distributed according 
to test theory. On the basis of this theory we can predict that 
a student's observed score will be no more than 


(a) two standard errors of measurement higher or lower than 
his true scores in 95% of the cases 
(b) one standard error greater or less than his true score in 


68% of the cases. 


Thus standard error of measurement can be used to 
establish confidence limits for students’ score within which a 
student’s true score is likely to fall. In our previous example 
used to explain SEM of the test, the 68% confidence limit is 
38--3.5 i.e. 54.5 lower limit and 61.5 upper limit. This means 
that there are 68 chances out of 100 that a student with 48 marks 
(observed score) on a test will have a true score between 54.5 
and 61.5. This concept uncovers the fallibility of test score 


102 Handbook of Pupil Evaluation: 


about which a test user must be aware for describing а 
student's status. 

For assignment of grades there should be a difference of 3 
standard errors of measurement wide for cach category. This 
avoids misclassification of students by more than one grade 
Category. А grade category is generally one standard devia- 
tion in width. A classroom test should have at least .89 the 
reliability coefficient for all the five grade categories normally 
used (A, B, C, D, F). 


12. Standard Error of Estimate 


The validity of a test is the extent to which it measures what 
it intends to measure. It indicates the extent to which intended 
purpose of a test is served. Whereas reliability indicates 
how free from errors our judgements are. The Standard error 
of estimate goes to estimate the amount of error we do make 
while making a judgement. While validity is used as an 
estimate of the accuracy of making particular judgements, the 
standard error of estimate is used for estimating the accuracy 
of a particular judgement for a given individual. 

Unlike SEM which indicates 


titude test is 4.5. We can now 
tandard error of estimate, that the 
counsellor is reasonably certain that Anita would score between 
8544.5 i.e. 80.5 to 89.5 marks. 

From this discussion of SEM and standard error of estimate 
it is evident that these two Concepts are very useful in estimat- 
ing the confidence level or bands Presenting the lower and 
upper limits of true scores, thereby Providing the needed basis 
for meaningful interpretation of observed scores that accrue in 
classroom tests. : 


2 


Basic Statistical Concepts in Measurement 103 


13. To Sum Up 


In this chapter we studied the basic statistical concepts which 
are pre-requisite to the understanding of measurement concepts 
in the forthcoming chapters. How scores obtained by use of 
classroom tests can be made more meaningful by applying basic - 
simple statistical concepts, are discussed. Organisation of data 
into frequency distribution, calculation of measures of central 
tendencies, percentile range, standard deviation, quartile devia- 
tion followed by discussion on measures of relationship like 
correlation are briefly discussed in a non-technical language. 
Calculation of reliability and standard errors of measurement 
are discussed along with their implications. A deliberate 
attempt has been made to clarify these concepts with the help 
ofexamples. More simple and usable formulae involving less 
technical and mathematical know-how- which can be applied by 
an average classroom teacher, are given. The major purpose of 
including these fundamental concepts before the actual text on 
measurement and evaluation is that the reader would be able to 
comprehend better the explanations that follow in forthcoming 
chapters, and appreciate how simple basic statistical concepts 
are useful in manipulating data to facilitate meaningful 
interpretations. 


CHAFTER IV 


INSTRUCTIONAL OBJECTIVES AND 
EVALUATION 


1. Introduction 


More often than not the word objective, goal and aim are used 
interchangeably in educational literature. This causes not only 
confusion for the students of evaluation in understanding the 
process of education but also creates difficulty in fixing time 
targets for instructional outputs at various levels which become 
guide posts for teaching-learning sequences and strategies. To 
what extent has an instructional programme succeeded? What 
are the indicators of success or failure? For finding a rationale 
to these questions, one has to look for some reference point or 
the criterion against which the outcomes of instruction can be 
Judged for its success or failure. This reference point or the 
criterion is the ‘objective’, ‘goal’ or ‘aim’ for which instruction 
was planned. It is therefore, necessary that first of all we 
should be clear about the connotation of these words. Once 
this is understood, further details about their sources, levels, 
classification, attributes, types, statement, specification and 
role in the teaching-learning process can be better appreciated. 

As a result of the study of this chapter you should be able 


to: 


(a) recognise the difference in nature and scope of the word’s 
objective, goal and aim, 
(b) derive objectives from various sources like learner, society 


Instructional Objectives and Evaluation 105 


and knowledge, 

(c) identify objectives at various levels from national to 
classroom levels, 

(d) classify objectives into various categories according to 
different taxonomies, 

(e) detect errors in statement of various objectives, 

(f) define different objectives in terms of behavioural outcomes, 

(g) analyse the implications of criterion-referenced and norm- 
referenced objectives in evaluation, 

(h) establish the relationship between objectives, learning 
experience and evaluation, 

(i) appreciate the role of objectives in. teaching, learning and 
testing. 


2. Nature of Objectives 


An objective is a normative concept which carries with it the 
idea of goodness or the desirable. What is desirable, depends 
on what we consider good for the learner. Who decides what 
is good and desirable? Society sanctions the desirability or 
goodness of the behaviour of an individual. This in turn 
depends on the values that the society cherishes. Thus what 
sort of individual or learner do we visualise as a result of our 
having educated him through the instructional programme? 
This means particular types of behaviours or desirable changes 
that we can expect our students to acquire after undergoing 
education. It is these intended changes or learning outcomes 
or expected modes of behaviours that we like to appraise also 
in the pupils in order to judge the effectiveness of the educa- 
tional process. 

If society considers manual work and the dignity of labour 
as a cherished value in a secular democracy, this value is- made 
explicit in the form of objectives that are stated in the course of 
study as ‘development of the dignity of labour among students’. 
Socially useful productive work then becomes a new discipline 
or a course of study to achieve that objective. Therefore, the 
intended product of learning and not the content of learning 
15 reflected in an objective. Although an objective refers to 
values which are judged as desirable and are given priority for 


106 . 1 Handbook of Pupil Evaluation 


transmission through the educational process, yet it is not a 
value by itself. It isa product of value judgement. Likewise, 
an activity, content, a learning situation or an experience. or a 
process of learning, the medium of learning etc. are all means 
towards attainment of intended or expected learning outcomes 
but are themselves not the end product of learning. Thus an 
objective is concerned with the product of learning rather than 
. with the process of learning. The difference between the process 
and product of learning can be appreciated better by the follow- 
ing example given in a schematic manner based on Gronlund (1). 


When ___ Planned Learning Leads , Intended 


Learner “Undergoes ~ Experiences to Learning out- 
comes 
(Process) кошы) 
у 


A. Study of Petunia flower in the Laboratory 
(a) Exposition by teacher (a) Knowledge of floral parts 


(b) Pupil teacher discussion of Petunia 
(c) Pupils’ observations (b) Ability to describe flower 
(d) Section cutting by in semi-technical terms 
students and demonstra- (c) Skill in observing 
tion by teacher (d) Skill in section cutting 
(e) Drawing diagrams by (e) Skill in drawing diagrams 
students of flower 


From this illustration it is obvious that 


(i) objectives indicate direction of students’ growth (a— c). 
(ii) objectives vary in their complexity; a is simpler than b. 

(iii) certain objectives (с—е) can be developed through varied 
content (not only Petunia flower) while others (a) have 
specific content. 

(iv) certain activities (Process) lead to development of more 
than one objectives e.g. observation leads towards objectives 
a, b and c. 

(v) more than one activity may lead towards the same objec- 
tives e.g. activities a, b, c and e lead towards objective a. 


Instructional Objectives and Evaluation 107 
Therefore, an objective represents the end point towards 
which activities are directed. They reflect the purposefuiness of 


the teaching-learning process. 


To sum up 


An objective is not 
a life value but a product of value judgement. 
a process but a product of learning. 
an activity but a learning outcome. 
a method of teaching but expected outcomes of instruction. 
content of learning but an outcome of learning. 


Lh д” e ROE 


2.1. Goals and Objectives 


Both these terms are often used synonymously. However, 
educational planners and evaluators use the term 'goal' as a 
general statement of desired outcomes, aims, ог purposes. 
having long-range implications and involving complex. human: 
behaviours. They are useful in stating the purposes of 
education, purposes of a curriculum to express a view point, in 
identifying priorities for a policy statement and to communicate 
with the layman. Statement of goals helps the teachers in 
communicating programme goals to administrators, students. 
and parents besides conceptualising the desired outcomes in. 
proper perspective. 
Following are the goal statements: 


(a) To develop a wholesome personality. 
(b) To develop physical fitness. 

(c) To develop a literate citizenry. 

(d) To develop ethical and spiritual values. 


Since these statements are global it is difficult to know from: 
direct observations whether these goals are achieved. 

An aim is also a general statement which reflects a specific 
focus, a level of attainment or a point towards which action is. 
directed. Goals are ultimate aims while specific aims may be 
called objectives. Likewise -general objectives are aims. Aims. 


108 | Handbook of Pupil Evaluation 


are long-ranged while objectives аге short-ranged when 
‘compared to each other. Even objectives (instructional) may 
be terminal and short term, depending upon the time required 
to achieve those objectives. Therefore, to reach a goal one 
may aim at a particular level followed by another level till one 
reaches the goal. In order to achieve the pre-determined aim 
one may have proximate aims or objectives, the attainment of 
which becomes pre-requisite to achieve the aims. Thus 
objectives are specific aims with more precisely defined targets 
or aims achievable in an observable time limit. These objec- 
tives can be further classified as terminal and short-term 
instructional objectives. The following examples will clarify 
further the relationship among goals, aims and objectives. 


‘Goal : TO develop a literate citizenry for Indian demo- 
cracy. 
Aims : To universalise primary education. 


Terminal By the end of the year 1980, 20% of the students 

Objective : will be able to read and write simple sentences in 
their mother tongue. 

‘Short-term Students will write their name correctly at the end 

"Objective : of two weeks. 


From this relationship we can infer that 


(а) Objectives, aitas and goals reflect the end product of 
learning at various levels of attainment. 
‘(b) There is hierarchy and complexity in the behavioural out- 


comes as we proceed from short-ranged objectives to long- 
ranged objectives, aims and goals. 

(c) The higher the level of statement the more global the objective 
becomes and- the more difficult to identify observable 
outcomes. 


(d) The more comprehensive or global is the statement, the 
more areas of disciplines it encompasses for its attainability. 


2.2. Objective as Ends and Means 


‘From the relationship between the three terms it is obvious that 
"whatever the level of the product of learning or learning out- 
come may be, a hierarchical sequence is present in their attain- 


bat, 


Instructional Objectives and Evaluation 109 


ment. In other words, lower level outcomes are achieved first 
before proceeding to the next higher level. This means that an 
objective at Level 1 becomes a means to the attainment of an 
objective at Level2 which in turn becomes a means to the 
attainment of objective at Level 3. Therefore, objectives 
indicate intended learning outcomes at different levels, thereby 
providing direction to pupils’ growth. They provide a basis 
for planning and organisation of the learning experience and 
selection of an evaluation instrument. It is through objectives 
that a link is established between teachers, evaluators, parents 
and students by focussing their attention on the intended 
product of learning. ~ 


3. Sources and Derivation of Objectives 


The sources of objectives can be traced to variors empirical 
approaches or to the significance of behaviour considered 
desirable in a particular social milieu. Both these approaches are 
discussed below. 


3.1. Empirical Approach 


Whenever we think in terms of the desirability of the objectives, 
we are looking for behaviours which are desirable or accept- 
able. Which behaviours should be expected from a student or 
a worker who has undergone particular course of instruc- 
tion? (2) Furst refers to the various approaches like job analysis, 
critical-incidence-technique, diagnostic studies etc., are used for 
this purpose. In job analysis, those activities are identified that 
help the performer to do the job well. These activities viz-a-viz 
the behaviours become the objectives of instruction. Likewise in 
the critical-incidence-technique, the significant or critical behavi- 
ours are noted during the course of the activities undertaken by 
the performer. Such outstanding behaviours are supposed to be 
desirable. It is assumed that when such behaviours are acquired 
by the learner, they would enable him to do his task well. 
Such critical behaviours are chosen as the intended behaviours 
or the expected learning outcomes. In case of diagnostic studies, 
the behaviours of good performers are compared with those of 
poor performers on the basis of which the list of those behayi- 


110 Handbook of Pupil Evaluation 


ours that determine good performance on the job is drawn out. 
This list then becomes the intended behaviours, the basis for 
instructional programmes. 

Obviously, all the above mentioned methods depend upon 
certain tools and techniques which become our guide posts. 
This approach makes an evaluator the servant while the tools 
and techniques act as his master. In fact, it is not the type of 
activity or the tool but it is the significance of the activities that 
should matter. Which activity is desirable, involves a question 
of values. Therefore, the choice of activities depends on what 
is considered good and what is desirable. Which activity is 
good and acceptable is determined by society. Activities like 
thieving are not considered socially desirable and therefore, 
cannot be taken as one of the objectives in the process of learn- 
ing. Activities which promote intended behaviours representing 
preferred values, determine the choice of the desirable and are 
made explicit in the form of educational objectives. 


3.2. Study of Different Sources 


Although the formulation of objectives depends on value 
judgements they cannot wisely be obtained by individuals or 
committees making such judgements off the cuff or by accepting 
long-standing practices by critically examining different sources. 
Three fundamental sources of deriving objectives are the learner, 
the society and the fund of knowledge in different subjects. 


(a) Needs of the 1сагпег 

The most important source of objective is the information about 
students. What are their needs and abilities? What аге the 
understandings, habits, attitudes and interests that the students 
need to develop? How do students learn better? What are the 
things that they сап or they cannot learn ata particular matu- 
rity level? Such factors have to be considered while determining 
the end product of learning. The efforts are to be made to 
impart knowledge, develop understandings, and cultivate atti- 
tudes and habits and to capitalise the basic interests. Thus a 
y ofthe learner and his needs provides good infor- 


: stud х à 
careful ng objectives of the curriculum. 


mation for derivi 


Я 


Instructional Objectives and Evaluation 111 


(b) Demands of society 

Investigation of contemporary life indicates the demands and 
commands of the society. It reflects constraints and problems 
which the young pupils have to overcome by meeting the chang- 
ing and challenging conditions of the time: the activities they 
will be expected to perform in the near future; the problems that 
they are likely to face; the difficulties that they will run up 
against; opportunities for self-realization etc. Much depends on 
the philosophy that the nation holds. The nation with a demo- 
cratic social set-up expects its citizens to develop critical minded- 
ness and the decision-making skills. What are the financial and 
manpower resources that condition the attainability of certain 
objectives? What are the other important factors that put cons- 
traints on qualitative and quantitative expansion and improve- 
ment of education? Such an analysis of our present life reveals 
the kind of behaviours that students should develop in order to 
become better citizens of tomorrow. Therefore, societal needs 
and social philosophy are basic to the derivation of objectives. 


(c) Requirement of discipline 

The nature and developmental level of knowledge in a discipline 
is also useful in determining the educational objectives. Since 
vast knowledge is accumulating at а tremendous rate we have 
to make a choice between what is worth-teaching or worth- 
learning and what is worth-eliminating. How best can the nature 
of the discipline be reflected with ease and economy? What is 
the opinion of experts and specialists in the various disciplines? 
What do the reports of various commissions and committees say 
about the preferred objectives? Answers to questions like these 
will go a long way in determining the educational objectives. 
(See chart on the next page). 


3.3. Criteria for Selection 


If the list of specific objectives is prepared it may run into 
hundreds in a particular subject. It is, therefore, essential that 
selection must be made on the basis of some criteria agreed 
upon by the specialists in the field. Following criteria may be 


112 Handbook of Pupil Evaluation 


Sources | 


Demandsofihe 
Socicty 


Needsofthe 
Leamer 


Necds'and Socio-ccenomic Fund of 
Potentialitics Demands Knowledge 
Knowledge and Problems and Choice for 
Understandings 7? constraints <= worthwhile 
knowledge 


Resources and 
‘opportunities 


Social setup 


Knowledge in the + 
discipline 


Skills and 
abilities 


Documented knowledge 


Attitudes and 
itudesand ___ У =— in reports and 


terests 
oe v literature 
Psychology ___s, Philosophy  ____ Opinionof 
of learning of life Experts 


V 


useful for selection of objectives: 
(i) Objectives should be worthwhile and have educational 


significance. 

(ii) They should agree with the broader goals of education. 

(iii) They should be in accordance with the psychology of 
learning. 

(iv) They should be comprehensive enough to cover all areas of 
human development. 

(v) They should be attainable under the school conditions. 

(vi) They should be testable in terms of observable and varifia- 
ble changes. 

(vii) They should be acceptable to teachers from our stand point 
of teaching resources, availability of time and instruction. 


4. Stating Objectives at Various Levels 


Objectives can be identified and stated at various levels right 
from the national level to the periodwise or lesson level. 
National level objectives emerge from the constitution which 
provides the values to the society. The objectives at the national 
level are most global. The Kothari Commission (3) lists four 
major objectives, viz. education for production, education for 
national integration, education for modernisation and education 
for development of spiritual and moral values. It is these 
Objectives which become the source for formulating objectives 


Instructional Objectives and Evaluation 113 


at the state level, which in turn pave the way for delineating 
stage-wise objectives (elementary, secondary and university). 
Then comes the objectives at the subject level which are drawn 
in conformity with the stage-wise objectives. From the instruc- 
tional objectives of a subject we identify the objectives of a unit 
ofteaching. Unit-wise objectives are basic to listing of period- 
wise or lesson-wise objectives. | 
Thus we see that a link is established among objectives at 
various levels. There is some invisible thread connecting the 
objectives between any two levels. At the highest level the objec- 
tives are too abstract to be achievable but as we descend from a 
higher level to a lower level, they become more concrete. Who are 
the persons to formulate objectives at various levels? From the 
- student and the classroom teacher to economists and politicians, 
are involved depending upon the stage at which these are to 
be formulated. The following chart indicates the personnel who 
are usually considered appropriate. In actual practice, however, 
all state agencies concerned with curriculum development do 
not always take advantage of the expertise indicated in the chart 
given below: 


Level Suggested Personnel 
(а) National level Educationists, Economists, Sociolo- 
gists, Politicians, Scientists, Psycho- . 
logists and Administrators 
(b) State level Educationists, Administrators, Eco- 
nomists, Sociologists, Curriculum 


4 Experts and Teachers 
(c) Stage level Educationists, Administrators, Curri- 
culum experts, Teachers, Parent 
Y Associations ; 
(d) Subject level Subject experts, Curriculum experts, 
Teachers and Subject associations 
(e) Unit level Curriculum experts and Teachers 
(f) Lesson level Teachers and Pupils 


To appreciate the link between the objectives at various 
levels we may examine the following illustration of objectives 
stated at all the six levels. 


L-1. To modernise education for best exploitation of natural 


122 Handbook of Pupil Evaluation 


7. Classification of Educational Objectives 
Depending upon the basis of classification, educational objec- 
tives can be categorised as under: 


7.1. Two-way Horizontal Classification 


Criterion Classification Illustrations 


1. Level of generality (a) General versus (i) Ability to solve problems 


specific (ii) Ability to add whole 
numbers 
2. Time contingency (b) Ultimate vs. (i) To develop good health 
proximate habits 
(ii) To know four basic food 
groups 
3. Concreteness or (c) Tangible vs. (i) To acquire knowledge of 
tangibility intangible social studies facts 
(ii) To develop social sensiti- 
vity 
4. Course coverage (d) Single vs. multi- (i) To develop dissectional 
ple course skills among students 
objectives (ii) To develop ability to 
judge 
5. Operativeness (e) Stated vs. (i) To develop critical 
functional mindedness 


(ii) To identify facts from 
opinion OR to identify 
underlying assumptions 


6. Area of ' (£) Cognitive vs, (i) To develop understanding. 
development non-cognitive of parts of speech 
(ii) To develop appreciation 
for music 


(iii) To handle clinical thermo- 
meter efficiently 
7. Type of response (g) Open vs. closed (i) To develop a plan of 
(varied or action for reducing noise 
stipulated) pollution 
(ii) To identify three past 
perfect sentences in a 


given para 
8. Туре of learning (h) Product vs. (i) To develop concept of 
outcome process osmosis 


(ii) To develop observing 
power of pupils 


Instructional Objectives and Evaluation 123 


Likewise more bases could be used to have two-fold or three- 
fold classification of educational objectives. 


7.2. Multiway Horizontal Classification 


A number of attempts have been made to classify objectives. 
As earl as 1918 the Commission on Reorganisation of 
Secondary Education in America gave the following objectives 


(5): 


(a) Health (d) Vocation 

(b) Command of the fundamental (e) Citizenship 
processes (f) Worthy use of leisure 

(c) Worthy home membership (g) Ethical character 


7.3. Multiway Vertical Classification 


(i) Ebel (1966) classified the various classroom tests into follow- 
ing categories (6): 


(a) Understanding of terminology (Vocabulary) 
(b) Understanding of facts and (Generalisations) 
principles 
(c) Ability to explain or illustrate (Relationships) 
(d) Ability to calculate (Numerical problems) 
(e) Ability to predict (What is likely to 
happen under specified 
conditions) 
(f) Ability to recommend appro- (In some specific prac- 
priate action tical problem  situa- 
tion) 


(g) Ability to make an evaluative 
judgement 


(ii) Guilford (1967) in his structure of intellect has described 
three faces of intellect, namely the content, operation and 
the product each of which is defined as under (7): 


126 Handbook of Pupil Evaluation 


of application as shown below: 


` Bloom Evaluation Evaulation Madaus 

Synthesis Synthesis Analysis 
Analysis 

Лт ы. 
Application App Enig 
Comprehension Comprehension 

^ 
Knowledge Knowledge 


(iv) Decorte (1973) 
The four dimensions of this classification model are: 


1. The subject matter of specific content of a given 
universe of objectives. 
2. The domain of information to which the subject matter 
belongs. 
3. Тһе product: The formal aspect of information the 
objectives produce. 
4. The operation: as defined in Guilford’s model: 


- | 
A. Receiving-reproducing B. Productive operations 
4.1 Perception of information 4.4 Interpretive produc- 
4.2 Recall of information tion of information 


4.3 Reproduction of information 4.5 Convergent produc- 
tion of information 

4.6 Evaluative produc- 
tion of information 

4.7 Divergent produc- 


tion of information 


‘This system integrates Bloom and Guilford’s models and 
becomes a tool for defining cognitive objectives. 


(v) Hannah and Michaelis (1977) 

The framework of instructional objectives given by these 
authors gives four categories or types of learning, each of which 
has a series of levels within it (11). 


Instructional Objectives and Evaluation А 127 


1. Data gathering is the first category which serves as а 
foundation for the other three types of learning. Observing 
and recalling are the two bases for data gathering which 
lead to the other three categories mentioned below: 

2. Intellectual processes (Cognitive domain) 

3. Skills (Psychomotor domain) 

4. Attitudes and values (Affective domain) 


Cognitive domain (Intellectual processes) 

According to the authors, intellectual processes are all actions 
involving mental or cognitive abilities arranged in a hierarchical 
order based on increasing complexity as shown below: 


5. Inferring 10. Evaluating 
4. бак 9. Predicting 

3. Casting 8. Hypothesising 
2. Comparing 7. Synthesising 
1. ий M 6. Analysing 


^ if 


Data gathering 
Observing and Recalling 


(vi) Singh (1977) 

Working on empirical validity of Bloom’s taxonomy the author 
himself found that as far as analysis and synthesis are concerned, 
these categories seem to be redundant as they appear to be a 
part of the application objective as evidenced from empirical 
validity. The other categories seem to be hierarchised as shown 
here (12). 


Knowledge——-Comprehension—-> Application – — Evaluation 


(vii) N.C.E.R.T. Taxonomy (1986) 

N.C.E.R.T. taxonomy is based on more than 20 years of usage 
in the field. It is validated by thousands of teachers involved 
in evalutaion workshops for development of objective based 


114 Handbook of Pupil Evaluation 


resources of the country. 
L-2. To conserve natural resources by preserving forests. А 

` L-3. To conserve. ánimal life for maintaining balance in 
nature. 

L-4. To understand the various food webs, food chains and 
food pyramids in a forest. 

L-5. To study various types of food pyramids. 

L-6. To know the concept of producer, consumer and decom- 
poser. 


The list is only partial, just to show the nature of objectives 
which can be traced from one level to the other level downward. 
In fact at each level there will be more than one objective and 
each objective when further derived will provide a list of more 
than one objective at each level as shown in the following sym- 
bolic derivation from level 1 to level 6. 


Level-1 Objective-1 Objective-2  Objective-3 O bjective-4 
| | Lf 
Level-2 “016102 103 


[ЖИ = 
Level-3 01 02 03 


ЈЕ IE. ... 
Level-4 01 02 " 
Level-5 01 02 03 
Level-6 ol ol 03 


Similarly the Objective 2, 3 and s 
downward. The Process of derivi 
as follows: : 


О Оп will be worked ош 
ng objectives сап be summarised 


1. Develop understanding of the role 
for formulating a list of objectives. 


2. Identify and state the general purpose of the programme or 


the project so that Participants in the educational process 
look to the common referents, 


* Trace the programme or project goals from item- 
This may be in operational terms indicating the 10 


of objectives and the need 


2 above. 
gical sub- 


Instructional Objectives and Evaluation 115 


divisions. 

4. Sub-categorise each goal into a statement of objectives with 
more specific behaviour referents. i 

5. Develop a time frame in statement of objective so that 
terminal objectives can be differentiated from enabling 
objectives. 

6. Revise the list of goals and objectives in the light of 
experience gained in planning, operating and evaluating the 
programme. | 


5. Attributes of Usable Objectives 


By and large curriculum specialists agree to the statement of 
objectives in the form of a complete statement. They should also 
be comprehensive enough to include cognitive, affective and 
psychomotor areas of students’ development. Since the list of 
objectives may be quite lengthy it may be necessary to curtail 
this sometimes keeping in view the needs and resources. There- 
fore the priority-order of goals and objectives must also be 
specified so that emphasis may be given accordingly. 

While stating an instructional objective, the following ele- 
ments may be noted for understanding the complete statement 
of an objective (4). 3 ў 


5.1. Elements of an Objective 


(a) Content: What is the object of study? What content, ie. a: 
fact, concept, principle, or any other content element is the 
facts of learning. Thus a content object is basic to the state- 
ment of any objective. Three basic types of fruits, four types of 
adjectives, factors affecting climate etc. are examples of object/ 
content. 4 


(b) Behaviour communicativeness: Specification of the beha- 
vioural elements is essential to indicate that an objective has 
been achieved. What behaviours should a student exhibit to 
demonstrate the understanding of 3 types of fruits? Defining the 
3 types of fruits or citing illustrations of each type or classifying 
the given fruits into 3 categories etc. could be an intended 
outcome, e.g. the ability to classify fruits. 


116 Handbook of Pupil Evaluation 


(c) Conditions of occurrence: Under what conditions and 
facilities can the student work and show his mastery of the 
Objective? Conditions should be specified to describe special 
arrangement. 


Example 


Give the basis, the students will be able to classify the different 
fruits into various categories. 


(d) Performance level: What is the acceptable level of per- 
formance? What is the Criterion of success?-Should students 


classify fruits with 100% accuracy, 50%, 33% or some other 
level of accuracy? 


Example 


Given the basis, Students should be able to classify 9 out of 10 
fruits correctly. 


(e) Target audience[individuals: Should 100%, 90% or some 
other percentage of students achieve the objective? 


Example 


Given the basis, 80% of the students should be able to classify 
8 out of 10 fruits correctly, 


(f) Time of Performance: When should the objective be 
attained: at the end of the year, term, unit ora lesson? This, 
should be reflected in the objective. 

Example 


Given the basis, 80% of students. Should be able to classify & 
out of 10 fruits correctly by the end of the unit. 


(g Appropriate level of specificity: Since ап Objective can be 
specified at different levels ra 
trivial details—like the abilit 


Instructional Objectives and Evaluation 117 
The pupil understands 


(a) the biological terms facts, concepts, principles etc. 

(d) the chemistry of various physiological processes in man. 
(c) the mechanism of digestion in man. 

(d) the role of various enzymes in digestion. 

(e) the function of ptyaline in digestion. 


In this example the objective is specified at five levels.. 
Depending upon the level at which you are teaching, an 
objective can be stated. 


5.2. Terminal, Enabling and Facilitating Objectives 


Terminal performance objective is the desired outcome of learn- 
ing experiences expressed in terms of the observable behaviour 
of the learner. According to Magar (1962) “‘It is a description 
of a pattern of behaviour we want the learner to be able to 
demonstrate". (4) Behaviour is overt and is recognisable at the 
end of a specified time. Output objectives and task descriptions 
are other terms used synonymously in meaning (5) by writers 
like Gagne (1965). 


Example 
To improvise an apparatus illustrating the process of osmosis 
when the needed equipment is available in the laboratory. 


An enabling objective is the desired observable behaviour 
that occurs earlier in time than the terminal behaviour i.e. it 
becomes an entering or pre-requisite behaviour for the terminal 
performance behaviour. Just as addition is pre-requisite to 
multiplication; understanding the concept of osmosis is pre- 
requisite to improvisation of the experimental set up for 
demonstrating osmosis, although both are expressed in the same 
way ie.in terms of the specific observable behaviour of the 
learner. The facilitating objective generally refers to the behavi- 
our of others (administrators, planners etc.) They involve 
activities such as making a film, developing learning units, 
preparing a teacher guide, remedial exercises etc. Such objec- 
tives, therefore, facilitate the attainment of terminal and enabl- 


^s 


118 Handbook of Pupil Evaluation: 


ing objectives. 


5.3. Open and Closed Objectives 


Open objectives are descriptive while closed objectives are 
Prescriptive. Therefore, the former involves divergent thinking 
leading to varied responses while the latter involves convergent 
thinking calling for one correct or the best response. Some 
objectives, however may contain both closed and open elements. 
Both types of Objectives are used in specifying learning out- 
comes. The following examples are representative of the two 
types. 


1. Given 10 out of 50 difficult words, the student will write 
the correct spellings of 10 words within two minutes. 
(closed) 

2. In an outline diagram of an animal cell the student will draw 
the four major organelles as given in the text. (closed) 

3. After reading the chapter on pollution each student will 
suggest three measures of his own to check water pollution. 
(open) 

4. Given an unseen passage the students will write its summary 
in about one third of its length. (open) 

5. The student will demonstrate his ability to interpret the 
given diagram of an experimental set up by 


(a) identifying the three major components of the experi- 
ment. (open) 


(b) describing two Points relating to control and experi- 
mental variables. (open) 
(с) stating the purpose: of the experiment. (closed) 


The first and second objectives are closed, the third and: 
fourth are open while the fifth is a combination of both. 


5.4. Overt and Covert Behaviour 
Overt behaviour is observable while covert behaviour is not 


observable. Listing, describing, classifying, stating and answer- 
ing questions are examples of overt behaviours. Intellectual 


Instructional Objectives and Evaluation 119 


processes like interpreting, generalising and evaluating are 
covert behaviours as also responding, performing and other 
behaviours reflecting attitudes and values. Covert or primary 
behaviour is clearly identified as it is persecuted first while overt 
behaviour or secondary behaviour follows the covert behaviour 
and indicates a guide line for the assessment of learning. 

The following two examples are illustrative— 


1. Students will demonstrate their ability to interpret by 
drawing a floral diagram of a given flower. 
(Interpret is covert behaviour and drawing floral diagram 
is overt behaviour.) 

2. Student will apply their punctuational skills by writing a 
paragraph with correct punctuation marks. 
(Apply is covert behaviour and writing is overt behaviour.) 


6. Stating Instructional Objectives 


Different curricula show different modes of statement of objec- 
tives. The following list of objectives together with the improved 
version indicates the criteria for stating objectives and the 
common errors observed in statement, of objectives at the senior 
secondary stage. 


Poor Better 


а) (2) 


(i) To enable the pupils to under- (i) The pupil understands scientific 
stand scientific principles. principles. 
(ii) The pupil studies the clinical (ii) The pupil handles the clinical 
thermometer thoroughly. thermometer efficiently. 
(iii) The pupil understands and (iii) The pupil applies biological 
applies biological principles principles to new situations. 
to new situations. 


(iv) The pupil develops skill. (iv) The pupil develops skills in reading 


a barometer. 

(v) (a) The pupil develops prob- (v) The pupil develops the ability to 
Jem solving skills in analyse a given problem to know 
mathematics. what is given and what is required, 

(b) The pupil develops the | 


128 1 Handbook of Pupil Evaluation 


test items, topic tests and sample question papers, organised in 
collaboration with various examining boards. Objectives under 
the cognitive domain are listed hereinafter (13). 


NCERT Taxonomy Objectives 


1. Knowledge 2. Understanding 3. Application 

1.1 Recalls 2.1 Translates 3.1 Analyses 

1.2 Recognises 2.2 Illustrates 3.2 Hypothesis 
2.3 Identifies 3.3 Suggests 
2.4 Detects 3.4 Establishes 
2.5 Compares 3.5 Reasons 
2.6 Classifies 3.6 Generalises 
2.7 Interpret 3.7 Predicts 
2.8 Explains 3.8 Judges 


9. Affective Domain Society 


(i) Krathwhol et al. (1964) 

This taxonomy was developed by Krathwhol, Bloom and 
Masia (1964) in which internalisation was used as the 
organisational principle for the following levels (14). 


1. Receiving 1.1 Awareness 
1.2 Willingness to receive 
| 1.3 Control or selected attention 


Л Acquiescence 
2 Willingness 
3 Satisfaction 


| } 
pA Spee 
^M Ў 
"a 
3. Valuing .l Acceptance of value 
3.2 Preference for a value 
(3 


2 
2 
2 
3 
3.3 Commitment ^ 


4-—— 


4. Organisation 4.1 Conceptualisation of a value 
4.2 Organisation of a value 


5. Characterisation 5.1 Generalised set 
5.2 Characterisation 


(ii) Hanna and Michaelis (1977) 

According to the authors, observing and recalling are basic 
processes involved for ‘data gathering’ which forms the founda- 
tion of the cognitive, psychomotor and affective processes. In 


Instructional Objectives and Evaluation 129 


the affective domain, the 'attitudes and values' are categorised in 
the following manner in order of complexity of committment 
involved: | 

. Integrating 


5. 

4. Preferring 
^ 

3. Accepting 
T 

2. Complying 


) 
1. Responding 
i 
Data gathering 
Observing and Recalling 


This includes levels which describe the sequence of develop- 
ment of such interests, appreciations and adjustments as well as 
attitudes and values which schools try to encourage and develop 
in their students. Based on the principle of commitment, the 
levels are arranged in a hierarchical order. 


10. Psychomotor Domain Taxonomies 


(i) Simpson’s taxonomy (1966-67) gives following levels for 
objectives arranged by the level of complexity and sequence in 
the performance of a motor act (15): 


1.0 Perception 1.1 Sensory stimulation 
1.2 Cue selection 
| 1.3 Translation 
i 
2.0 Set 1 Mental set 
| 2 Physical set 
3 Emotional set 


Imitation 


Trial and error 


2: 
2. 
2. 
У. 
3.0 Guided response 3.1 
| 3,2 
4. 


4.0 Mechanism 1 Habitual response 


4.2 Patterned response 


5.0 Complex overt 5.1 Resolution of uncertainty 
response 5.2 Automatic performance 
Y- 
6.0 Adapting and (Suggested as a possible sixth level) 


originating 


130 Handbook of Pupil Evaluation 


· (ii) Alles (1967) 
A major ordering principle in this taxonomy is *routinisation" 
and adaptability is the second complementary criterion. Thus 
the principle of adaptive routinisation is significant for learning 
of psychomotor skills. Accordingly, the following set of ordered 
levels of execution in the psychomotor domain emerge (16). 


1.00 Initiatory level of execution 
2.00 Pre-routine level of execution 
2.10 Pre-routine Non-adaptive sub-level 
2.20 Pre-routine Adaptive sub-level 
3.00 Routinised level of execution 
3.10 Routinised Non-adaptive sub-level 
3.20 Routinised Adaptive sub-level 


(iii) Dave (1971) ; 
He included the following levels arranged in terms of the con- 
cept of coordination (17): 


Impulsion 


1:0 Imitation 1.1 
1.2 Over repetition 


| 


d є 
2.0 Manipulation .1 Following precautions 
| 2.2 Selection 
| 2.3 Fixation 
3.0 Precision 3.1 Production 
| 3.2 Control 
4.0 Articulation 4.1 Sequence 
4.2 Harmony 
5.0 Naturalisation 5.] Automatism 


5.2 Internationalisation 


(iv) Harrow (1972) 
He developed a taxonomy to categorise movement behaviours 
according to the following levels (18): 


1.0 Reflex movements 1.1 Segmental 
- 1.2 Intersegmental 
1.3 Supra segmental 


instructional Objectives and Evaluation 131 


2.0 Basic fundamental 2.1 Locomotor 
movements 2.2 Non-locomotor 
2.3 Manipulative 
3.0 Perceptual abilities 3.1 Kinesthetic 
3.2 Visual 
3.3 Auditory 
3.4 Tactile 
3.5 Coordinated 
4.0 Physical abilities 4.1 Endurance 
4.2 Strength 
4.3 Flexibility | 
4.4 Agility 
5.0 Skilled movements 5.1 Simple 
5.2 Compound 
` 5.3 Complex adaptive skill 
6.0 Non-discursive 6.1 Expressive movements. 
communication 6.2 Interpretive movements 


(v) Hanna and Michaelis (1977) 

Like cognitive and affective domains, the levels categorised 
in skill emerge from the foundational category of data gather- 
ing through observing and recalling. Increasing independence 
from assistance or direction in the performance of an act 
constitutes the hierarchy of instruction as shown below (19): 


5. Improvising 
4. Applying 
1 


3. Mastering 
^ 
2. Patterning 


1. Imitating 
1 
| Data gathering 


| 
| Observing and Recalling 


(vi) Maclay's taxonomy (1969) 
This taxonomy 15 based on the work of E.J. Simpson (1966-67) 


and comprises the following levels (20). 


132 E Ї Handbook of Pupil Evaluation: 


1.0 Perception 
1.1 Sensory stimulation 
1.2 Static cue discrimination 
1.3 Dynamic cue discrimination 
1.4 Translation 


2.0 Readiness for further experience following perception:(1.0). 
2.1 Mental readiness to select and synthesise 
2.2 Physical readiness to adapt 
2.3 Emotional readiness to respond 


3.0 Guided response 
3.1 Imitation 
3.2 Trial and error 
3.3 Following of written or other guidance 
3.4 Obtaining a series of simple experimental results 


4.0 Mechanism 
4.1 Choice of materials, apparatus, tools, reagent etc; 
4.2 Planning the order of activities 
4.3 Exercising skill 
4.4 Execution of task 
4.5 Interpolation and/or prediction from results 


5.0 Complex response 
5.1 Adaptation 
5.2 Competent planning and confident procedure * 
5.3 Skilled execution and professional poise 
5.4 Clear and concise reporting 
5.5 Prediction of possible extended investigations 


6.0 Complex response 
As under 5.0 but without guidance 


11. Perception Domain 


Moore (1967) e Е ^ 
In addition to cognitive, affective and psychomotor domains, 


Moore (1967) identified the need for a fourth domain, the per- 


серца]. According to this domain, the following levels arranged 


” 


1изїгисїїопа1 Objectives and Evaluation $ 133 


án terms of integration are given (21): 


1.0 Sensation 1.1 Visual ji Awareness of 
1.2 Auditory qualities of 
1.3 Tactile etc. stimulus > 
2.0 Figure perception 2.1 Size 4 у 
2.2 Form 
2.3 Location | Figure-ground 
2.4 Position | 
еїс. J 
3.0 Symbol perception 3.1 Letters 
32. Digits 
3.3 Other signs 
4.0 Perception of 4.1 Value of percept or symbol 
meaning 4.2 Relations such as 


(a) cause-effect 
(b) symbol-percept 
4.3 Abilities to generalise 
4.4 Understand implications 
4.5 Take decisions 
5.0 Perceptive per- 5.1 Observation 
formance 5.2 Diagnostic ability 
(a) Medical problems 
(b) Artistic products etc. 
(c) Electrical systems 
5.3 Problem solving 
5.4 Creativity 


12. Experiential Domain 


(i) Steinaker and Bell (1975) 
They have developed a model for class room experience though 
the experiential taxonomy. This is based on Gestalt’s approach 
and is called a functional tool by the authors of the taxonomy. 
All the three domains are integrated in this taxonomy. Its 
major aspects are: 
1.0 Identification of 1.1 Motivation 
components 1.2 Interaction 
1.3 Assimilation 
1.4 Replication 
1.5 Involvement 


134 Handbook of Pupil Evaluation 


2.0 Development of 2.1 Terminal performance objectives. 
curricular 2.2 Introduction and reinforcement 
experiences 2.3 Mastery sequence 

2.4 Strategies 
2.5 Skills 


2.6 Evaluation 
3.0 Format of Evaluation 


All the taxonomies discussed above are useful in 


(a) preparing objectives, planning instruction and assessment. 

(b) identifying abilities implied under each objective. 

(c) developing curriculum content and methodologies of 
instruction. 

(d) item banking. 

(е) understanding relationships among objectives of different 
categories in various domains. 

(£) reorientation of courses of studies in teacher education. 

(g) developing objective-based evaluation procedures. 


13. Criterion Versus Norm-referenced Objectives 


We have already defined objective as an intended product of 
learning. This is reflected in the performance of students. While 
writing objectives we may specify a criterion of acceptable per- 
formance. According to Mager (1962) we should not only say 
what the learner should be able to do but also how well he 
should be able to do it before his performance is acceptable. 
For example, “given twenty three digit . multiplication sums the 
students should be able to solve at least 15 correctly within half 
an. hour", not only describe the students’ behaviour but also the 
level of performance to “pass”. It is presumed that all those 
students who. score less than 15 marks need remediation before 
they proceed to the next unit of learning. 

This idea is comparable to the concept of a mastery test. A. 
mastery test covers vital essentials of a course which are repre- 
sented: іп its items and are.considered essential for students to 
know before they proceed to the next level of instruction. The 
student must demonstrate that he has “mastered” certain basic 
objectives at a predetermined level of performance. Such mastery 


ге 


Instructional Objectives and Evaluatian ; 135 


tests distinguish between students who have mastered 
those who have not mastered the basic essentials. In pees 
this, discrimination tests which we normally use in our cl to 
rooms are designed to differentiate students at different in HE 
achievement irrespective of the standard of achievement. Thus 
the criterion-referenced approach is like the programmed 
instruction approach in which a student masters a small unit of 
materials. He then proceeds to the next one or if need be, does 
remedial work to master it and then proceeds to the next unit. 
The two approaches can be appreciated on the following models 

one representing the traditional norm-referenced and the ines 


criterion-referenced approach. 


Norm referenced objectives Criterion referenced objectives 


Unit Objectives 


"Teach or learn 
new concepts 


Unit Objectives 


Teach and learn 
new concepts 


Undertake 
remedial 
work 


Assign marks Yes 


In the criterion-referenced model it is, therefore, imperative 


„о describe objectives in terms of criteria regarded as an accept- 


able standard of performance which must be reached before the 
student is allowed to proceed to the next learning unit. A 


teacher, therefore, must appreciate how stating of objective in 
behaviour terms, non-behaviour terms and criterion-referenced 
luating students and for 


terms is useful or is a handicap in eva 
feedback of evidence for instructional practices. Three examples 
of stating objectives will further clarify this. 


1. Non-behavioural 
To use correct English grammar. 


136 Handbook of Pupil Evaluation 


2. Behavioural 
Given sentences containing grammatical errors, the students will 


, be able to rewrite correctly. 


3. Criterion-referenced & 
Given sentences containing grammatical errors, the student. will 
be able to rewrite correctly at least 80 per cent of the sentences. 


14. Place of Objectives in Teaching and Testing 


Since specification of objectives is primarily meant for visualis- 
ing the scope and nature of each objective, it helps the teachers 
and evaluators to focus their attention on the terminal behavi- 
ours of students. It is these intended learning outcomes that 
become the basis for evaluation as well as instruction. As far as 
testing is concerned, it is almost imperative that the testing 
situations should be so selected as to cover all these behaviours 
or learning outcomes or at least an adequate sample of those 
behaviours. Each of these behaviours can be tested singly or in 
combination of two's or three's as we normally do by using 
essay-type questions. Such a possibility of testing learning- 
outcomes one by one is quite desirable, especially when we are 
interested in diagnosis of pupils! weaknesses and providing 
remedial instruction to improve their achievement. 

On the other hand it should not be construed that teaching 
can also be done on one-to-one basis. For example, it is quite 
ridiculous to say, “I am now teaching for interpretation, then 
for analysis and then for developing ability to hypothesise."" If 
we purposely do this, it means that we teach for a specific 
behaviour and test for that behaviour. We teach for another 
behaviour followed by testing for the same behaviour. 
Obviously, if we do that we are teaching at a low level. In 
reality this is not the case because teaching is an integrated act. 
It is neither possible, nor desirable to visualise that a teacher 
is teaching for one particular objective for ten minutes, for the 
second objective for five minutes and so on. This is not the 
intent of specifying objectives in behavioural terms. What is 
needed is to appreciate the relationship between objective, 
teaching or learning and testing. An examination is a. sampling 


Instructional Objectives and Evaluation 137 


process while teaching is an integrative process. But what is 
worth realisation is the objective-based teaching and objective- 
based testing so that the one could be geared to the realisation 
of objectives and the other to the testing for those objectives. 
How both teaching and testing can be dove-tailed to the. point 
of instructional objectives viz-a-viz unit objectives, is made 
clear in the unit plan 14.1 given on page 138-39. 


14.2 Objective-based Questions Testing Specific Objectives 


Concept 
For germination of seeds, air, water and suitable temperature 
are necessary. 


Specific objectives that can be tested. => 
1.0 Knowledge 


11 Recalls 
List the three conditions necessary for germination of seeds. 


1.2 Recognises 
Seed can germinate in the absence of: 
A. Moisture, B. Temperature, С. Oxygen, D. Soil. 


2.0 Understanding 


2.1 Translates 
Define the term ‘germination’ in your own words. 


2.2 Interprets 
Three beakers as shown below were placed at room tem- 


perature for a week. What does this experiment demons- 
trate? 


Figure-A Figure-B Figure-C 


Handbook of Pupil Evaluation 


138 


AIOA ST 11 uaa spoq Aosinit 
uo pua1ds 55918 Arp st AYM T'S 

{49} VA\-JopuN 3dox 

SI ЧО!ЧА\ poos ou osn jou op 
OM JI озиолојит оцу са [IIM ЈЕЧАМ TP 


ispoos Jo uonvumnudo8 Јој 6105 
-sooou o1njsiour pue 118 оле AYA l'H 
“үйәшцәйхә ugaq-39143 OY) 921} | 
-впү 0} цојоуѕ роо ® AVIA T (3 


*juouinrodxo uvoq-2910] эц} NLI А 
-ѕиошәр 0} ѕпуслеййе oy) пз lc 


2 
тиоцешшләВ лоу A1085999U 9JU 
дојам pue. *o1nje1oduo] 194) à 
moys 03 juouriodxo uv 2910521 Vt 


:Зшмоцоу 
эчу әң! ѕиопѕәпр 52504 лоцовој oup [c 


‘posod suonsanb əy} Joasue 0} Ату ѕпапа AJL тр 
слојем Ш passat 
лец poss әүррїш əy} doox ом Op Aya (9) 
(1uouiodxo 
əy} ui spzos 31y} ISN ом Op Aym (0) 
:3имојјој au о suonsoüb sjnd лоцовој p 
"sj1ed $помгл oy) s[oqe] pu? dn 195 Jezusu 
-Hedxo oy} jo qore*[s oy) sep папа oup. T'E 
"juoturodxo oq ѕшлојләй 
рие snje1edde oi dn sjy папа oqr ec 
"juourodxo oy} dn 301125 10у 
suonnvoaid рие suononijsut SOAIS лоцовој pz 
хә) OY} SPLOT put sojou sov) папа ƏY zT 
"uoneuruno8 
10j 91njeJodum9) 9[qvjins рив 19jyvA 
2 JO ÁAjIssooou ay} sure[dxo лоцорој eur TI 


Jo oSpo[wous sty Айде o} 


Aypiqe ay) sdo[oAop папа әчү, 


ч 


"juouriedxo oq; ut рәзп spoos 
3911} OY) 03 o[qe[reav suon 
"Ipuo» OY} 5олеа шоз апд + 
‘dn jos [eju2uriodxo əy) Jo 

шәз8тр ot SMVIP папа оуу, © 


"juoaurodxo 
Ч099-9910) очу dn Suns 
ur 1145 54 ојолор папа эчү, 


а 


"uon 
-guru3 10} Á1UsSo22u suon 
"Ipuoo Əy} s[[v531 папа эчү, "т 


suonsonb uorjen]vAd 


SannAnoe 8шитеәү-Зицәвә], 


SoAn2ofqo 


иоцеотшлә3 10} 1559290 SUO]}IPUOD :jurod JUIL 


uvid пий сүү 


139 


Instructional Objectives and Evaluation 


'559201d Surü1vo[-3urqovo) оц јо syuouoduroo 1ofeur 22141 
I} 3uoum diysuonrjos [29901 ou ojeroo1dde ој st osodind шеш aq, *oArisneqxo зои pue олпзовапо [UO 51 URI SIYL :2/0A7 


"попезуџрјо 
logjinj 10} suorsonb syse папа оц, cc 
4514) 10] suosvor 
21915504 ay} oq p[noo YA, 
`ә1еш1ш1ә8 Jou prp Kou зпа 
91nje1oduro) jo suonipuoo шпш 
-hdo 1opun род зәмор ou ш *pasod uonsonb ay} 0) sosuodso1 snorea 59418 
5рәәѕ vod-joo^s synd roupiv8 у тс рие џопепу5 uoAIS oY) sos£eue папа от T'S 
¿KYA amw 
JOU op Kou) spoos 241 Jo SUIMOS 191je 
Аүәуегрәшші ures Алвоц srolou)j] (9) "оопепиз јефшејип е 801 
1105 əy} ш doop umos поцм -AJOAUI еиәшопәца Sururep[dxo 
ipIoo әјешшлә8 jou spo3s әлеш Op Ацд\ (v) ur uoneururs 10} suonpuoo 


140 Handbook of Pupil Evaluation 


23 Illustrates ` E 
Give one example of a seed that requires more moisture for 
germination and one requiring comparatively less moisture. 


2.4 Identifies relationship 
Why does a seed not germinate when sown too deep? 


2.5 Compares 
In what respect does germination of a gram seed differ 
from that of a castor seed? 


2.6 Locates error 
Observe the given diagram C of the experiment (Item 2.2) 
to demonstrate the conditions necessary for germination and 
. Point out the mistake if any. Suggest rectification also. 


2.7 Explains 
Explain in not more than 100 words the mode of germina- 
tion in a pea seed. 


2.8 Classifies 
Categorise the following seeds into two groups on the basis 
of their mode of germination. 
A. Pea. B. Gram. C. Castor. D. Maize. E. Sun Flower. 
F. Onion. 


3.0 Application 


3.1 Analysis 
Some quality seeds of wheat were sown in two separate 
pots А and B using the same quality of seeds and sample 
of moist soil and kept at the same place. The seeds in pot 
A germinated whereas the seeds did not germinate in pot 
B. Under what conditions would this experience be 


relevant? i 


3.2 Hypothesis 
When mustard seeds are scattered on a suitably prepared 
soil, they germinate but maize seeds do not. But when 


Instructional Objectives and Evaluation ` 141 


Sown a little deep under the same soil maize seeds germi-- 
nate but not the mustard seeds. 


3.3 Suggests procedure 
Bigger seeds require more moisture than smaller seeds.. 
Suggest an experimental procedure to test this hypothesis. 


3.4 Gives reason 


Why seeds generally do not germinate when there is a: 
heavy rainfall immediately after sowing? 


3.5 Draws conclusion 
What conclusion can you draw from the observations of ^ 
experiment shown in figure C, item 2.2. 


3.6 Predicts 
What will happen to germination of seeds in figure B,. 
(item 2.2) if the whole apparatus is removed from a lighted; 
room to a dark room? 


4.0 Skill 


4.1 Draws sketches 
Draw a labelled diagram to show the experimental set-up» 
of the three-bean experiment. 


These examples indicate how the same concepts can be used» 
to test different specific objectives. 


15. Objectives and the Item Bank 


Development of an item bank cannot be thought of without. 
the knowledge of instructional objectives. In fact each and. 
every question in a bank is objective-based і.е. testing one or- 
the other objective. The only thing that a teacher should be 

cautious of is the cognizance of the objective or the behavioural: 
outcomes which each question attempt to test. In fact the- 
quality of question bank determined by the objectives tested by 

questions which form the bank. Collection of questions testing the 
merely knowledge of facts do not make a quality bank, whereas 


42 Handbook of Pupil Evaluation 


a bank with a majority of its questions testing students’ ability 
to apply, analyse or evaluate is definitely the superior question 
bank. The greater the variety of question under each objective 
i.e. the more the coverage of specified behaviours implied by ап 
objective, the more the validity of such a question bank would 
be for the purpose of testing. It is, therefore, necessary that 
teachers try to frame questions on selected concepts testing 
different specific objectives (specifications) implied under various 
Objectives as exemplified above under 14.2. 


16. Objectives as Help or Hindrance 


There is an overwhelming belief in the need for clarity and 
specificity in stating educational objectives. The three major 
reasons are: they provide the direction or goal towards which a 
curriculum is aimed; facilitate selection and organisation of 
content and learning experiences and become a referent for 
evaluation of the learning outcomes. As early as 1924, Bobbit 
(21) in his book, How to make a Curriculum, listed nine areas 
in which 160 major educational objectives were mentioned. 
Pendalton listed 1581 social objectives for English; Giller listed 
more than 300 for Arithmetic їп grade 1-6 and Billings pres- 
cribed 888 generalisations for social studies. By 1930, however, 
this movement collapsed due to the non-manageability in 
classroom of such long lists by the teachers. Again in the late 
forties and during the fifties curriculum specialists started 
-emphasising the need for specific educational objectives. Herrick - 
(1950), Tyler (1951), Bloom (1956), Krathwhol (1964), Harrow 
(1972) were the outstanding curriculum specialists who gave 


their classification or taxonomies. Ра. 
Inspite of the tremendous work done on this direction 


‘specialists like Eisner (1967) are very critical of such specified 
objectives formulated in advance. He argues that the process of 
instruction is very complex and dynamic rather than mechanis- 
tic with numerous outcomes which cannot be listed in be- 
havioural and content terms in advance (22). One cannot predict 
in advance many of the collateral or emergent outcomes. Eva- 
luation experts of today do not however claim that a priori 
-guesses can cover the entire outcomes of learning. АП learning 


Instructional Objectives and Evaluation 143 


outcomes which are amenable to description and identification 
can be formulated and agreed upon. A statement of objectives 
in operational terms does provide the needed framework and 
basis for the teaching-learning process. What is perhaps more 
agreeable is to start with the rational formulation of objectives. 
which may be called ‘hypothesised objectives’ or construction 
objectives. At the end of instruction, some of them might be 
modified, deleted or added, thereby leading to developed or 
summative objectives. This middle-of-the-road approach of 
hypothesised and emergent objectives can better claim to explain 
and reconcile the two view points. 

A second hindrance is the constraints that a discipline puts. 
It is alleged that more precision in specification of objectives is 
possible in science subjects than in arts, where it may be 
difficult or not even desirable to accept the same end-product. 
Novelty, creativity and originality are unpredictable. This is 
also challenged by Atkin (1963) according to whom (23) the so- 
called relative simplicity in science and mathematics is illusory 
if we examine the latest curriculum projects in Science and 
Mathematics. Nevertheless, to my mind the nature and 
structure of a discipline does create problems regarding agree- 
ment of the intended outcomes of learning in case of less matur- 
ed disciplines like History and Sociology as compared to 


Mathematics and Physics. 
The third hindrance that Eisner points to is that objectives 


cannot be used as criteria or standards against which achieve- 
ment is measured. Making a distinction between the appli- 
cation of a standard and the making of judgement, he says that 
socially defined standards as in the case of logic, rules of 
grammar it is possible to measure and compare quantitative 
and not qualitative judgements that can ђе used to 
measure aesthetic objects or reflective essays and the standards 
are not applicable. This comment, as Hastings points out is 
based on a wrong concept of evaluation which is equated with 
testing. All description about pupils’ performance are not 
quantitative but are qualitative also. So evaluation consists of 
descriptions and judgement of those descriptions. That 
descriptions should only be quantitative is indeed harmful. 
Another hindrance pointed out is that formulation of objsc. 


144 Handbook of Pupil Evaluation 


tives prior to curriculum. activities is not psychologically the 
most efficient way to proceed. Teachers can identify first the 
appropriate activities, rich in educational potentials, and then 
identify the related objectives. Macdonald also supports this 
view by saying that this is a heuristic device of formulating 
objectives which are known only after completion of an act and 
not before: “What am I going to do or teach?” And not with, 
“What am I trying to accomplish?” Addition, deletion and 
modification of objectives is the strategy in all modern projects 
on curriculum in U.K. Stated objectives, during instructional 
process get metamorphosed into new outcomes. But this is 
possible only when some advance list of objectives is formulated 
on rational basis and experience. Such constructed objectives 
may later on be transferred into formative objectives during the 
teaching learning process and finally come out as summative or 
emergent objectives. 

Thus this controversy of specific objectives is more apparent 
than real. If one accepts evaluation as a comprehensive concept, 
involving judgements on both the qualitative and quantitative 
aspects and develops faith in the goal-directedness of educa- 
tional efforts, formulation and specification of educational 
objectives would not only be useful in planning and developing. 
curriculum content and methodology but also in validating the 
whole process of teaching and learning by reflecting the 
intended learning outcome. The only caution is that too much 
fragmentation into unwieldly lists should be avoided to make it 
more realistic and practicable in the classroom. Moreover 
such a list should be considered only tentative and must be 
revalidated in the light of instructional outcomes resulting. 
from the pupil-teacher interaction. Once objectives are clearly 
formulated, the evaluation process becomes more well directed. 


СНАРТЕК У 


MODERN EVALUATION—A TOTAL 
SCHOOL CONCEPT 


1. Introduction 


The use of words like measurement, examination and tests 
have been in vogue ever since the formalisation of education to 
judge the effect of instruction. on students' achievement. The 
subjective judgement of the teacher, classroom questioning, 
performance on à school assignment, quality of home work, 
pupils’ products in the form of a chart, model, or a composi- 
tion and observation of behaviour of students are the modes of 
assessment practised by teachers formally ог informally 
twoards instructional ends. The focus of all these methods of 
pupil's assessment is always on the improvement of student's 
learning. Since written work occupies a far more prominent 
place than oral and practical work, the use of written examina- 
tions in the form of periodical, terminal or annual tests have 
come to stay in our educational system. More often than not, 
such written tests are used for judging the end-product of 
learning rather than as a teaching-learning device to improve 
student's achievement. The reason is obvious. The evalua- 
tion concept is considered synonymous with an examination or 
а test. This has narrowed down the scope of evaluation. which 
in fact is a much broader concept and is integral to the total 
teaching-learning act. Thus the nature of the evaluative process 
can better be appreciated if evaluation is considered a broad, 
dynamic and a composite concept. 


146 Handbook of Pupil Evaluation 


It is, therefore, intended that study of this chapter should 

enable the reader to 

(a) define the various terms like examination, measurement 
assessment, evaluation etc. in his own words. 

(b) differentiate the nature, scope and role of diagnostic 
formative and summative evaluation in teaching-learning 


process. 
(c) identify relationship of evaluation with other components 


of the teaching-learning process, 

(d) interpret the basic principles of evaluation and their 
implications. 

(е) explain the basic characteristics of evaluation in the context 
of teaching-learning process. 

(f) consider evaluation as a total school concept rather than 
occasional act of measurement of pupils’ attainments only. 


2. Nature and Scope of Evaluation 


Various terms associated with evaluation have already been 

described in the first chapter and need not be repeated. An 

attempt can be made now to visualise the relationship among 

the more common and interchangeably used terms on the basis. 

of the scope and function of each. Suppose we are interested 

in judging the effectiveness of our instructional efforts in terms 

of the impact on students’ learning or achievements we might 

have to do the following: 

(a) The type of learning in the measurement of which we are in- 
terested—intellectual, affective or psychomotor development. 

(b) Cognizance of acceptable standards of performance regarded 
as intended learning. 

(c) The hypothesis to be tested in the process to serve as a 
proof of evidence having a bearing on the expected learning. 

(d) Process of collecting evidence about the expected learning 
(examination, observation, opinion, products, analysis, ect.) 

(c) The use of instruments appropriate to the evidence to be 
collected. 

(f) The use of criteria for appraisal of evidence (estimation or 
accurate measurement). 

(g) Making a value-judgement in terms of given criteria. 


Modern Evaluation—A Total School Concept ` 147 


Expected end-product of learning, performance standard, 
indicators of acceptable standards, mode of collecting evidences, 
data gathering instruments, appraisal of evidence and judge- 
ment making are, therefore, the corresponding nodal points in 
evaluation. This leads us now to defining the related terms to 


determine the nature and scope of each. 


(a) Achievement refers to the intended learning in cognitive, 
affective or psychomotor domain regarded as the end-products 
of learning, reflecting the desired standard of performance (1). 

(b) Test refers to a product of a device used for determining 
the truth or falsity, meaning or value of a hypothesis (2. A 
Question Paper, a unit test or a term test is a tool or an instru- 
ment of measurement which is used to collect the evidence 
related to intellectual, affective or psychomotor development 
(learning). 

(c) Examination refers to the process of collecting evidences 
about pupils’ achievement. A process may be a written, oral 
or practical examination, an observational or inquiry technique, 
testing or analysis of a product or a document etc. Thus testing is 
one technique of examining like the observational technique (3). 

(d) Assessment refers to teacher’s estimate of a value against 
acertain standard. Therefore, it is only an estimate and not 
ап accurate evaluation. The term is appropriate in relation to 
such traits as attitudes, personality, co-curricular activities, 
S.U.P.W. where qualitative measurement (or non-measurement) 
is called for, unlike written testing which lends itself better to 
quantitative measurement (4). 

(e) Appraisal refers to accurate valuation of some 
characteristic or attributes against some criteria which are 
usually developed by experts or knowledgeable persons in the 
field (5). 

(f) Evaluation refers to the process of collecting information 
about a pupil's achievement through measurement (quantitative) 
or non-measurement (qualitative) on the basis of which judge- 
ments are formed and decisions made. Thus information 
gathering, forming judgements and making decisions are the 
major components of evaluation (6). 

(g) Measurement refers to the act of ascertaining the extent 


148 Handbook of Pupil Evaluation 


of pupil's growth or achievement. It deals with comparison of a 
quantity with an appropriate scale for the purpose of determin- 
ing the numerical value on a scale that corresponds to the quan- 
tity to be measured. Thusit is an act of assigning numerals (7) 
to an object according to certain rules (Lindquist). Relation- 
Ship between measurement and evaluation can be visualised by 
the following equation (by Gronlund, 1965). 


Evaluation— measurement 4-value jud gement 


Thus there cannot be any evaluation without measurement. 
but both are not synonymous. The following examples would. 
illustrate the relationship among various terms mentioned above. 


(i) A unit test was used to measure students’ learning on the 
unit *Respiration'. (Tool) 

(ii) The observation of the teacher was the basis of judging 
students’ behaviour in the morning assembly. (Technique): 

(iii) On the basis of testing, students were identified for 
remedial teaching. (Technique) 

(v) The annual examination alone was made the basis of 
promotion of students. (Process) 

(у) Most of the students were estimated as having good 
health habits during the last term. (Assessment) 

(vi) Alka secured 58 marks out of 100 in English. (Measure- 
ment-quantitative) 


(vii) Kumar got excellent in Hindi dictation. (Measurement- 
qualitative) 


(viii) In terms of mean performance (45) of the class Alka is 
ranked among the top five students of her class. 
(Appraisal) : 

(ix) Luna is the brightest boy in English and deserves a prize. 
(Evaluation-judgement forming) 

(x) Sanjay is one of the weakest students in the biology class 
and should be given remedial instruction. (Evaluation— 
decision-making) 


The term evaluation is, therefore, no longer used аз equiva- 
lent to examination, measurement or assessment. Evaluation is 
a process which is concerned ultimately with decision-making. 


Modern Eyaluation—A Total School Concept 149 


Such decisions may be about the process of learning, testing, 
administration of instruction or the end-product of learning. For 
decision-making, formation of sound judgements is а pre-requi- 
site. Without valid judgements proper decision cannot be 
taken. Judgements are formed in terms of certain referents, 
purpose and methodology of comparison of performance. 
Judgements in turn are based on the information collected 
about students. Collection of information is, therefore, the 
basic ingredient of evaluation. 

Information gathering, information processing, judgement 
forming and decision-making are, therefore the four major 
components of evaluation. We may now define evaluation as 
under: 


Evaluation is a process of collecting information about 
pupils’ learning and processing it to form judgements 
which in turn are used to take decisions. 


3. Diagnostic, Formative and Summative Evaluation 


Whenever a prefix is added to evaluation, it restricts its 
scope. As such, diagnostic, formative and summative evalua- 
tion refer to only one phase of evaluation, emphasising the 
diagnostic, developmental and judgemental aspects of evaluation. 
Diagnostic evaluation is primarily meant for placement of 
students on the basis of pre-requisite entry behaviours. It 
determines the cause of repeated difficulties of pupils so as to 
provide for remedial measures. It is used before the actual 
teaching of the unit in hand as well as during developmental 
teaching when pupils fail to grasp the subject. The focus of 
diagnostic evaluation is on the identification of trouble spots 
and their causes so that a deliberate attempt can be made to 
improve pupil’s ability to benefit maximally from instruction. 
Therefore, if the purpose of evaluation is to place the individual 
on a learning continuum so as to enable him to benefit from 
instruction, it is diagnostic evaluation. 


Formative evaluation is an integral part of the teaching- 
learning process and therefore, cannot be considered in isolation 


150 Handbook of Pupil Evaluation 


or apart from instruction. The focus of formative evaluation 
is on improvement of students' learning and not on certification 
ofachievement. Itrefersto continuous evaluation by means 
of unit tests, informal class tests, assignments, classroom ques- 
tions etc. It helps to diagnose pupils’ strengths and weaknesses. 
The results of formative evaluation are used for further 
improvement of instruction and pupils achievement by 
adjusting ends and means. Classroom instruction ordinarily 
invokes formative evaluation. Thus, formative evaluation is 
diagnostic on the one hand and developmental on the other. 


Summative evaluation is considered as ап end-of-the-term 
or end-of-the-course activity. It is directed towards making 
judgement about the end products of learning and is usually 
undertaken in a formal manner. The focus of summative 
evaluation is on the measurement of pupils" achievement and 
its certification. Term tests, annual tests and external examina- 
tions, whether conducted by schools or a public agency, сап be 
categorised as summative tests. The purpose of summative 
evaluation is classification and grading of students rather than 
identification of pupils’ strengths and weaknesses. The results 
of summative evaluation are used for certification and/or 


passing judgement on students’ achievement. 

The three types of evaluation are more or less inter-depen- 
dent, representing the three major phases of evaluation in the: 
teaching-learning process. The results of summative evaluation 
can be used after analysis, for a diagnostic purpose and for 
taking administrative decisions in re-classification of students or 
for instructional changes. The results of formative evaluation 
can similarly be used for classifying students into different 
groups for remedial work. Likewise, the results of diagnostic 
evaluation, whether before-instruction or after-instruction 
can be used not only for placement purposes but also for grad- 
ing students into mastery, partial mastery, and non-mastery 
groups for the purpose of remedial programmes. Therefore, it 
is mainly the purpose and the use of the results of evaluation 
that determines the premium to be laid on one or t І 
ttype of evaluation. The following table would further clarify 
he relationship among the three types of evaluation (8): 


he other 


151 


~ 
E 
© 
S 
o 
- 
S 
S 
Ks] 
чл 
- 
5 
~ 
1 
= 
E 
X 
= 
S 
53 
5 
3 
~ 
= 


STINTS 

"qns 49 зојцола [enpiATPUT 
џоџој мо 

рис wou yjoq ЈО 5019) Ш 


зәцӣ!ц 10 9569 А0559 
51029} 
]tuorvoanpo-uou о} 2у12205— 
. anoraryaq 
]euorjgonpo 01 оуіо21$— 
5151 12940 —540— 
sjuotunijsur ореш-лоцзеој— 
5150) uo y-puelS— 
51501 
олпешшипо pue олпешолд— 


10120) [U)uotu 
-Uo1|AUO pue Ásq 'sÁud “JOOW 
<0цоќѕ4 ‘әлдоәру ‘IANIUZOD 


S}[NSOI 1521 JO sisK[eue Joye pue 
uononijsur [enjoe 210јәя 


uononiisut 
101 juouioor|d 190014 


5о1омото ор Jo 8имоло25 
олпаи зоо 


xsv] |еотцотолон qoe? uo 
891055 [I€J-ssed JO илое4 


u 


ajo Jo 5020) UT 


paqiiosoid әд 320009 


Ацолелоц 
Ut 5452) JURADO О} oyi2ods 


51501 pousisap 511212905 


oAnnudo 


$s3201d 
Surua] Surgovo) ZUNA 


з3Әцәрә} 2ў јоплеој O}—YOeQ 
-p39,— S10119 JO UONLIOT 


1цәшәләцәр Jo упошолола шј 
]e1uouido[oA9 qq 


onsousveicq 


әлцеш1о4 


золпоогдо Aq 
591025-9п5 10 591025 |030], 


dnois 
V JO suo) ш Аүүрләпәгу 
?60L-S€ ә8юләлу 


spun os1noo/soAn 
29140 payysiom оў ројејо 


[tread Чело *urexo uon As 


1о}ошоцә/$@ *әлтий8о2у 
9s1noo “шзләу ‘run јо pug 
fiuipe13 pue uonvoyhsioD 


}чәшәлә1цәр JO juouronseo A 
јејџошоврпр 


oAneunung 


Suniodow ‘Ol 


uonegjoidioju '6 


(Хпәцир шә) 
sisK[puy 'g 


100)102 
[оацоогао Jo Zuidweg '/ 


uonejuounJjsu[ '9 


uonen|eAo? Jo 32odSy. “¢ 
Зин, cp 
suonounj 1ofejw "€ 


snog т 
omen `1 


siseg 


152 Handbook of Pupil Evaluation 
4. Eyaluation—An Integral Part of the School Curriculum 


More often than not, evaluation is considered as an end-of- 
the-course activity rather than a part and parcel of the total 
curriculum. One reason for this is the narrow concept of 
curriculum conceived by teachers. Without going into the nuts 
and bolts of the curriculum, it may suffice here to say that 
"curriculum is the totality of learning experiences for which a 
School undertakes the responsibility to reach predetermined 
goals through continuous appraisal and adaptation of the end 
products of learning and the process of learning". From this 
definition we may identify five fundamental questions which a 
teacher can ask himself. These are: 


(a) Why should I teach a particular subject (or a unit)? 
(b) What should I teach in that subject (or unit)? 
(c) How should I teach it well? (or how best can students 


learn?) 
(d) How best have I taught it? (or how best students have 
learnt?) 
(e) In what ways, I can improve my teaching? (or students' 
' learning?) 


The first question refers to instructional objectives which 
involve the question of values, preferences desirability and 
purposefulness and, therefore, reflects the philosophical basis of 
the teaching-learning process. The second question relates to 
knowledge in the discipline that forms the warp and woof of 
the subject matter i.e. the content of teaching or learning. 
Content selection involves purposes, social needs and priorities 
which the society demands. As such, this aspect rests on a 
sociological basis. The third question relates to teaching- 
learning strategies the selection of which depends on both 
objectives and the content but are conditioned by the needs of 
the learner and the psychology of learning and instruction. 
Thus selection and organisation of learning experiences are 
governed mostly on a psychological basis. The fourth question 
refers to the performance assessment which depends on the first 
three aspects; namely, the objectives, content and methodology 


= Ls РР АРРОС 


4% 


Modern Evaluation—A Total School Concept 153 


of teaching. Validity and reliability are basic attributes of any 
instrument of evaluation for the collection of data, on the basis 
of which judgements are formed. Thus the evaluation aspect 
involves a scientific basis of measument. The last question 
relates to improvement. This requires regular feedback of 
evidence that accrues from the evaluation process. This 
involves diagnosis and remediation and adjusting of ends and . 
means to improve students’ learning. Thus feedback brings in 
dynamism by generating interaction among all the components 
of the curriculum and may, therefore, be termed as the ecologi- 
cal basis of the teaching-learning process. The relationship 
among the five aspects can be depicted in the form of following 
diagram: 


cmm 


Objective — Content —>Method —>Evaluation 


Feedback 


5. Basic Principles of Evaluation | 


There are certain assumptions about the individual learner and 
the evidence that we get about him. It is on these assumptions 
that principles of evaluation can be identified. These assump- 
tions are as under: 


(a) Learner has measureable ability 

According to the measurement theory an. individual has some 
real ability, may be intelligence, or the ability to apply 
knowledge in a new situation, that differs from individual to 
individual. Thus each learner has some amount of this ability 
which can be measured. A learner's true score or true ability 
is estimated from the observations or evidences collected 


through measurement. Thus a learner has a true score. 


154 Handbook of Pupil Evaluation 


(b) An observation contains error 

A score obtained from an observation is not the same as 
the true score because whenever we measure we make errors. If 
the height of a person is measured 3 or 4 times with the same 
tapeby different persons there would be some difference, may 
be 1/24 or 1/36 of a centimeter. When this error is possible in 
case of physical measurement, it is more so in case of 
human or psychological measurement. 


(c) A true score is an observed score minus error score 
In terms of the measurement theory this assumption is written 
algebraically as under: 


X=t+e 


where X represents the observed score of the learner, 
“P represents his true score or true ability, 
‘e’ represents his error score. 

Error can be positive if the observed score is more than the 
true score and it can be negative if the observed score is less. 
than the true score. Accordingly it will be added or subtracted 
from the true score. Indeed, the true score is never truly 
known because we never know exactly the degree of error 
involved in a given case. Suppose we do know that Катап 
IQ is 110 but when we measure it may be 115 or 105 i.e. the 
score he gets may range from 105 to 115. In these two cases. 
we may write the score as under, using the same formula: 
X=t+e (115=110+5) OR X=t+e (105=110+(—5)). 

Thus identification of errors, their prevention, estimation and 
then tolerating these errors, is basic to the principles of 

measurement. Such errors, called measurement errors, decrease 
the accuracy of the observed scores. The error which are 
associated with traditional measurement errors (unreliable 
scores) refer to unreliability while those which occur due to 
information that is inappropriate for judgements refer to 
invalidity. 

Some basic principles help us to cope up with the problem 
of errors, TenBrink, (1975) has listed four basic principles (9). _ 


Modern Evaluation—A Total School Concept 153 
First Principle 


5.1. Errors can be Identified 


Many sources of errors can be identified and an effort made to 
prevent their recurrence. The following sources contribute to 
such errors. 


(a) Errors in information gathering instruments 
These may be due to: 


(i) inappropriate content, 
(ii) inappropriate sample of content, 
(iii) too difficult or too easy items, 
(iv) non-comprehensibility of the items, 
(v) ambiguity in the directions of items, 
(vi) lack of information about the length of test, time spent 
on observing etc., 
(vii) overemphasis or underemphasis on objectives. 


(b) Errors within the information gathering process 
These may be due to: 


(i) unclear directions in test administration. 
(ii) physical conditions for administration. 
(iii) technical incompetence of invigilators. 
(iv) computational and other errors in scoring. 
(v) tabulation and recording errors. 


(c) Errors within the evaluce himself 
These may be due to: 


() fluctuality of human traits e.g. attitudes, interests etc., 
(ii) specific attitude towards the task being evaluated. 


(iii) misunderstanding of directions and other ‘task related to 


what is expected of him, 

(iv) test wiseness or unwiseness of the test, 

(у) particular response set i.e. tendency to respond in 
consistent fashion like tendency to guess, 


a 


156 Handbook of Pupil Evaluation 


(vi) guessing, 
(vii) test-taking ability (or inability), 
(viii) state of health of the individual, physical, mental or 


emotional. 


"Second Principle 


5.2. Errors can be Minimised 


‘The more the errors presented the more valid and reliable is the 
information we are likely to obtain. This can be ensured by: 


(i) relating tasks to purpose of evaluation e.g. construction, 
selection, scoring, judgement making, use of evidence etc. 
features which should all be based on the reason or 
purpose of testing, 

(ii) avoiding ambiguity and inconsistency in the 
language, 

(iii) giving clear and consise instructions/directions, 

(iv) obtaining representative sample of information which 
describe the population, choosing appropriate technique of 


sampling and obtaining the sample. 


use of 


Third Principle 


5.3. Errors can be Estimated 


If some sort of estimation is made about the amount of errors 
in the measurement data we can use it to temper judgements 
and improve decision-making. To determine the magnitude of 
error$ we can use two strategies. One is to get empirical 
evidence and second is circumstantial evidence ог searching for 


clues about the validity of responses. For this various measures 
can be adopted such as the following. 


(a) For reliability | 
(i) Coefficient of stability or consistency over time. 
(ii) Equivalence or consistency across different forms. 


(iii) Equivalence and stability 


Modern Evaluation—4A Total School Concept 157 


(iv) Interna! «onsistency or consistency across different parts of 
a test. 
(v) Scorer (inter-judge) reliability or consistency across raters. 


(b) For validity 

(i) Concurrent validity. 
(ii) Predictive validity. 
(iii) Construct validity. 


Fourth Principle 
5.4. Errors can be Tolerated (lived with) 


Some errors will always occur inspite of the best efforts of 
careful and a systematic way of obtaining information. Never- 
theless, this information can still be used fruitfully if cautiously 
interpreted, checked and verified before use. For this we may 
‘hedge’ or avoid committing oneself to substantiate the truth of 
the information. 


(i) Take the observed score as an estimate of the true score 
which is accurate within certain limits. 

(ii) Make a hypothesis on the basis of the information gathered 
so that the accuracy could be tested to confirm or reject 
in the light of additional information. 

(iii) Confirm the accuracy or ‘truth’ of that score ie. substan- 
tiate the information. Replicate the evidence to test the 
‘truth’ of the informatinn (obtain information in similar 
circumstances) and supplement evidence for the testing 
truth of the evidence (obtain information under different 
circumstances). 


6. Process Characteristics of Evaluation 
A good evaluation programme has the following academic 


components or characteristics which reflect the evaluative 
process. 


158 1 Handbook of Pupil Evaluation 
6.1. Evaluation is Objective-centred 


Evaluation has a philosophical basis as it involves the question 
of values. What is worth learning or teaching is reflected in the 
formulation of objectives which may be intellectual, emotional, 
social, moral or physical in nature. Even when we accept a 
few or all of them, the question of preference and weightage 
comes in. Should we lay more stress on affective objectives 
than cognitive objectives at the primary stage? Should we 
teach moral values at this stage? Is socially useful productive 
work more important than the teaching of social studies? All 
such questions involve preference of values depending upon 
the idea of goodness or what is considered a desirable 
behaviour. In judgemental terms also, the evaluator has to be 
cognizant-of the referrent he is going to use. Since judgements 
can be made in terms of a class, an individual or a performance 
standard, it is again a question of values, preference or purpose 
in the light of which judgements are to be made. As evaluation 
involves some value judgement, we have to make a choice 
among the competing values or cherished values which become 
the basis for obtaining information as well as for interpretation. 
It is, therefore, suggested that evaluation should be based on 
predetermined instructional objectives and the evidence 
validated against those very objectives. 


6.2. Evaluation is a Continuous Process 


In order to make evaluation a tool of learning it has to Бе used 
more frequently as a part of teaching-learning activities. Since 
an individual grows all the time and acquires new behaviours it 
becomes imperative that the desirability and adequacy of these 
behaviours be judged side by side. This is necessary to affect 
changes in teaching and learning and acquaint students with what 
is mastered by them and what is still to be mastered in order 
to reinforce learning and feedback the results for diagnosis and 
remediation. This continuity can be ensured by making perform- 
ance-assessment a part of the teaching-learning activity. The 
more frequent the check-ups for regular feedback of students" 
Performance, the more continuous does the evaluation process 


] 
i 


Modern Evaluation—A Total School Concept 159 


become. Continuity of evaluation brings quality control into 
the process of learning. Therefore, evaluation should be made 
completely integrative with the teaching-learning process instead 
of being treated it as an end-of-the-course activity. 


Cognitive Social milicu Cognitive 


Psychomotor Learning milicu Psychomotor 


(Pre instruction) - - - -> (During instruction) - --» (Post instruction) 


Diagnostic evaluation Formative evaluation Summative evaluation 


| { 


For placement of students Forimprovingteaching For grading and certification 
learning strategies of achievement ^ 


6.3. Evaluation is a Comprehensive Process 


A pupil has different dimensions of growth; the intellectual, the 
emotional, the social and the physical. These aspects are 
represented in the form of cognitive, affective and psychomotor 
objectives. Therefore, unless evaluation provides evidence on 
all the aspects, viz-a-viz all the instructional objectives, it 
cannot be considered comprehensive enough to be depended 
upon. Likewise, coverage of both scholastic and non-scholastic 
areas is a must in any good programme of evaluation. Apart 
from testing all possible objectives and content areas, compre- 
hensive evaluation- involves the use of different tools and tech- 
niques to get different types of evidences on various aspects of 
evaluation. Evaluation, should therefore, be considered a com- 


prehensive process. 


6.4. Evaluation is a Cooperative Process 

Since comprehensive evaluation sceks evidence on all aspects of 
pupils’ development, the teacher alone cannot get full evidence 
about his growth. To collect evidence regarding social re- 
lationships, emotional behaviours, initiative, scientific attitudes, 


160 Handbook of Pupil Evaluation 


social attitudes, likes and dislikes etc. We need the collabora- 
tion of pupils, his peers, his parents and all those teachers who 
watch him grow and develop. Therefore, for good evaluation 
the cooperation of different individuals and agencies is necessary 
and as such evaluation should be considered a cooperative 
process. 


6.5. Evaluation is an Integral Part of Instruction 


1 is obvious from the teaching model referred to earlier, that 
evaluation is a part and parcel of the instructional process. It is 
not an end-of-the-course activity but is built into the teaching- 
learning process. Both instruction and evaluation go hand-in- 
hand. Such a type of evaluation is termed as a formative 
evaluation. Its focus is on the improvement of instruction viz.- 
a-viz. pupils’ achievement. Unless evaluation serves this 
function through immediate feedback, there cannot be improve- 
ment in the teaching-learning process. Thus the role of 
evaluation as a teaching device should be appreciated, besides 
its judgemental role. 


6.6. Evaluation is a Dynamic Process 


Evaluation is based. on objectives of instruction but at the 
same time it helps us to judge how far those objectives are 
attainable for a particular group of students. Likewise, evalua- 
tion is not only related to learning activities or instructional 
procedures but also provides evidence as to the effectiveness 
of the instructional efforts. Feedback loops from evaluation 
tothe other components of the teaching model indicates how 
regular feedback validates the whole 
and leads to continuous improve- 
ment in a spiral fashion. As it is the feedback component of 
evaluation which has received little attention in our system of 
evaluation. Evaluation at present is concerned only with 
collecting data and its interpretation to a limited extent. But 
how to use the data for the improvement of learning or the 
learning process is not considered a useful actvity. But it is this 
aspect that helps us to judge the suitability of the objectives: 
the efficiency of the process of teaching or learning, the quality 


evaluation through 
teaching-learning process 


Modern Evaluation—4A Total School Concept 161 


of the measuring tool itself and even the accountability of the 
administrative decisions. 

It is on the basis of feedback data that the learner can be 
informed of the adequacies to reinforce his learning. In the 
earlier stages extrinsic feedback in the form of encouragement, 
reward, incentive, etc. can be provided to the students. A 
regular feedback of all evaluation acts may ultimately lead to 
intrinsic feedback for the learner. Thus the most outstanding 
feature of evaluation is its ecological aspect which generates 
interaction among the various components of the teaching- 
learning process, thereby leading to the improvement in the 
learner, teacher, learning, teaching and even in administration. 
This interaction among various components of the educational 

“process generated by evaluation through feedback mechanism, 
is indeed the spirit of modern evaluation. Hence, evaluation 
should be considered as a service component of the teaching- 
learning process to emphasize the use of evaluation resulis 
through regular feedback to improve teaching and learning. 


7. Ecology of Educational Evaluation 


From the earlier discussion about thc nature, 
characteristics of evaluation it is quite evident that the narrow 
view of evaluation which equated it with testing is no longer 
accepted. Evaluation is treated more as a Process of decisi 
making than as a technique of testing. Its Scope is, therefore 
not limited to the evaluation of pupils’ learning but is extended 
to the process of learning. As Such, the focus of evaluation is 
more on the improvement of students" achievement than merely 
on measurement of their achievement. 

Since pupils’ achievement is a function 
environment, it is necessary that the deci 
must take into cognizance all condition 
transactions which condition Pupils’ Je 


judgements are formed, Thus th i 
he focus of modern evaluation 


15 more on the description and interpretation of students 
1 S 


adequacies or inadequacies in learning than on the certificati 
^ ^ SE cation 


and prediction of their achievement 


Traditional evaluation е asi TE 
mphasizes the Judgement 


scope and 


of the total school 
sion-making Process 
5, antecedents and 
arning so tha right 


al role ата 


Ne 


162 4 Handbook of Pupil Evaluation 


neglects the diagnostic role. Unless the evaluator is concerned 
with finding out the weaknesses in the pupils and the teaching- 
learning process itself, no administrative or academic decisions 
ате possible to improve their achievement or learning. Evalua- 
tion, therefore, must be treated as an integral part of the 
teaching-learning process. In other words, evaluation should 
be treated as a intelligence service of the educational process. 
The focus of present evaluation is, therefore, not only on using 
evaluation as a device for the furtherence of pupils’ learning 


but also for improving instruction. 
Of late, there has been a tendency to link pupil evaluation 


with curriculum evaluation and programme evaluation. This is 
obviously due to the fact that the pupils’ learning is conditioned 
by curriculum prescriptions and instructional efficiency and, 
therefore, pupils’ evaluation is a part of curriculum evaluation 
which in turn becomes a part of programme evaluation that 
takes into account other variables like context, inputs, strategies, 
processes etc. (10). It is, therefore, necessary to appreciate this 
link and the interaction of the learner with the curriculum 
objectives, content, learning process. the teacher (instruction) 
and evaluation. It is for this reason that a focus of the modern 
evaluator on enabling the teacher to get more and more insight 
into the ecology of the evaluative process so that some sort of 
equilibrium could Бе ensured in the teaching-learning process. 
Thus if we say that evaluation is à total school concept, it 
would not be wrong. 

The figure depicting integrative model of evalua 
the next page would clearly indicate the place of evaluation in 
the teaching-learning process and its role in generating interac- 
tion among various components, thereby indicating the ecologi- 


cal basis of evaluation. 


tion given on 


Modern Evaluation—A Total School Concept 


For sequentiol Units 


requiste en! 
behaviours 


Feed back 


Summansing 
and reporting 


Feedback 


An Integrative Mode! cf Evaluation 


192 Handbook of Pupil Evaluation 
(vi) arranges ideas in a logical sequence. 
(vii) uses proper gestures, avoiding mannerism etc. 
(d) Develop criteria of assessment 
For example: 
(i) Vocabulary—10 marks 
(ii) Pronunciation—10 marks 
(iii) Grammatical correctness—10 marks 
(iv) Fluency— 5 marks 
(v) Thought content—10 marks 
(vi) Gesture and mannerism— 5 marks 
(e) Define each criterion 
(i) Vocabulary (correct word, richness) 


(Fi) om 


(V1) е 
(f) Prepare a scale for each component 
A five-point rating scale may be developed using 4-0 


range of marks corresponding to better grades, A, B, C, 


D, E 
For example, for voca 
under: 
(a) Always uses appropriate words. 
(b) Frequently uses appropriate words. 
(c) Usually uses appropriate words. 
(d) Occasionally uses appropriate words. 
(e) Seldom uses appropriate words. 
(g) Develop marking scheme 
On the basis of weighta 
the rating of the examiner (A, B, 
we can award marks. If weightage to voc 
the rating of a student on vocabulary is C, t 


will be his score on vocabulary. AUS 
(h) Decide about the mode of conduct of examination е 
It is better to have a panel of 2-3 examiners. They e 
follow either holistic or analytical scoring. If the ичсе 
between the two approaches is not significant, ere 
botheration of the analytical method could be 270109": 


bulary we may define the limits as 


ge given to various criteria and 
C, D, E) on each criterion 
abulary is 10 and 
hen 10x2=20, 


Tools and Techniques of Evaluation 193 


Examiners may either choose one or more dimensions ~ 
cach and assess students with these dimensions only and 
then add the marks. Alternatively, each examiner may rate 
each student on each dimension and then pool the total 
marks for each student for comparison with the rating of 
other examiners. 

It is advisable to have one internal examiner besides the 
two external examiners. This might help students to 
established rapport with the examiners and enable them to 
show their best. 

(i) Devise questions and exercises 
(i) Questions for establishing rapport (not to be used for 


assessment). 
(ii) Questions of the short-answer variety like the guide 


conversation. : 
(їп) Questions requiring long responses. In this case the 
candidate is asked to select one out of 4-5 topics, gets 
5-10 minutes for thinking and then speaking on the 
topic. 
(j) Interpret the responses 
This is important from the point of view of feedback to 


schools. An over-all global-type of summary information 
is not useful for feedback. Information on each dimension 
should be analysed and interpreted for feedback to the 
schools to enable them to appreciate the deficiencies and 
plan for remediation. 
(k) Feedback the results for remediation 
On the basis of detailed information provided by the 
examiner: aye 
provide 4 Кй ы ates ен: igs Tay be take К 
зион Programme to make wp the debieleneies, 
th oral testing in India appears to be 
De theoretical side. A lot es e merely on 
Judging the validity о! empirical data are needed for 
tests especially for use Teliability and practicability of oral 
its limitations, the Em o ei examinations. With ali 
expression cannot be Пелеа ia ‘i oe in judging oral 
evaluation, more particular} A overall framework of 
my at the elementary stage. 


CHAPTER VI 


STEPS IN EVALUATIVE PROCESS 


1. Introduction 


The process of evaluation сап Бе visualised in different ways. 
depending upon the model of evaluation one seeks to propagate. 
Tf we recapitulate our discussion in previous chapter, we have 
tried to emphasise the integrative nature of evaluation. Evalua- 
tion is conceived as an integral part of the teaching-learning 
Process rather than a judgemental act to be undertaken at the 
end ofa unit, semester Ог a session, Likewise, our stress on 
its decision-making role must become our starting point and 
the feedback aspect as the last point in this process. Accord- 
ingly our sequential steps in explaining the evaluative process 
would be the specification of decision points, devising of situa- 
tions, obtaining evidences, analysis and recording of evidences, 
forming judgements, summarising, reporting and feedback of 


information. It is, therefore, expected that by reading this. 


chapter a teacher should be able to: 


(a) Specify the intended goals of learning for а unit, 


(b) devise situations relevant to appraisal of intended outcomes. 


_ Of learning, : 
(c) collect the needed evidence 
techniques, 
3) analyse and record evide 
and decisions, 


©) f 


9rm accurate judgements and take 


using appropriate tools and 


appropriate decisions, 


nce in terms of Purposes, goals. 


Steps in Evaluative Process t 165 


(f) summarise the evidence and report it meaningfully, 

(g) feedback the evidence for reinforcement and applying 
correctives, 

(h) appreciate the integrative model of evaluation. 


Each of the above mentioned objectives сап be achieved if 
the reader is able to understand the various tasks that are 
involved in the various steps discussed hereinafter. 


2. Steps Involved 


Step I. Specifying intended outcomes of learning (decision points) 
In any programme of evaluation, the foremost decision that 
should be taken is the purpose for which evaluation is being 
undertaken. Is it selection, placement, treatment, certification 
of achievement, classification of students or to judge the 
effectiveness of the products, strategies of a programme? ) 
Accordingly, the object of decision may be persons, products or 
a programme. Will the evidence or data of evaluation be 
used for administrative, instructional, guidance or research 
purposes? When will the decisions be made; before, during or 
after instruction? Is the individual, the class or the institution 
that would receive priority in the decision-making process? Can 
the decisions so made be reversed, modified or are they irrever- 
sible? These are some of the significant decision points which 
are to be visualised before specifying the intended outcomes. 
In case of classroom evaluation, teachers are concerned mainly 
with the instructional objectives, and therefore, we may restrict 
our discussion to pupil evaluation in this chapter. 

Since we have advocated the integral approach to teaching 
and testing, it is assumed that unit teaching and unit testing 
will form the basis of the evaluation approach. As such, speci- 
fication of intended outcomes of the unit (topic) of 
teaching will be the first step in the process of evaluation. 
While formulating objectives for the unit of teaching, care 
may be taken to include all attainable objectives. Though 
affective and psychomotor objectives are important and 
complementary to cognitive objectives, yet for reasons of 
practical difficulty, from the point of view of their teaching 


166 Handbook of Pupil Evaluatiorz 


vis-a-vis evaluation, they are generally ignored, especially in: 
external examinations. This is more true of affective objectives. 
while practical examinations, whenever they are in vogue, do 
take care of psychomotor outcomes. ы 

The instructional objectives of a unit when stated must be 
specified in terms of expected learnings outcomes. They must 
be stated at such a level of generality or specificity that they 
Should neither be too global to be tested nor so elemental or 
specific as to make an unwieldy list. At the unit level which 
of the following list of objectives would you prefer to adopt? 


(a) The pupil understands the different parts of speech in 
English. 

(b) The pupil understands the use of different types of 
adjectives. 

(c) The pupil cites examples of adjectives of quality. 

(d) The pupil understands the meaning of the word adjective. 


Since every objective has two major dimensions viz. the 
ability part and the content part, both must be reflected in its 
Statement. The content part should emphasise more the major 
ideas, major concepts or principles and not merely the factual 
information. This is necessary because teaching and testing 
are both to be geared to such objectives. Which of the 


following intended outcomes of learning will be more suitable 
from this point of view? 


(a) The pupil recalls properties of oxygen and hydrogen. 


(b) The pupil compares the chemical properties of oxygen and 
hydrogen. 


(c) The pupil develops skill in observation of plants. 
(d) The pupil develops skill in handling clinical thermometer- 


Some objectives are very important because they are pre- 
Tequisites to other instructional goals. Such Objectives play a 
more significant role in the elementary classes and may be 
termed as process objectives, Observing, measuring, classifying, 
Using numbers, communicating etc, are process objectives 
Which can be stressed more in the 'elementary classes. Such 


Steps in Evaluative Process 167 


objectives later on act as a means to product objectives like 
those of knowledge, understanding, application, analysis, 
synthesis and evaluation. Some objectives may reflect minimum 
level goals while others may go beyond the minimum level. An 
objective relating to the recall of certain biological facts may 
be kept at a minimum level, say, of 50 % attainment but an 
objective relating to the development of skill in handling a 
clinical thermometer cannot be stated at a minimum level but 
beyond it, perhaps at not less than 907; level or so. 

The ability dimensions of each objective can be well defined 
(1) into an almost fixed number of specific outcomes of learning 
in the form of competencies involving pupils’ ability to identify, 
translate, illustrate, relate, compare, classify, interpret, explain, 
analyse, hypothesise etc. However, the content dimension is 
unwieldy and has to be reflected in the form of major themes, 
concepts, principles and processes. This serves two. purposes: 
First, we tend to de-emphasise factual details and stress more the 
concepts and principles which really form the substantive struc- 
ture of the subject. Secondly, it is more handy to state the 
content aspect in a few words. 


1.1. Who Formulates Specific Objectives 


Unlike instructional objectives of the subject which are formula- 
ted by a cooperative enterprise of curriculum specialists, 
subject experts and teachers, the specific intended outcomes - 
ofa unit are to be identified or listed by the teacher-evaluator 
who is concerned with both the teaching and the evaluation of 
that unit of teaching. In the integral model of evaluation, it is 
the teacher who specifies the intended outcomes of learning 
and it is he who evaluates his students on those very outcomes. 
But in case of a summative evaluation, when an external 
teacher or an agency is entrusted with evaluation of students 
learning on the whole or a part of the syllabus, as 15 the case 
in a term test, semester or annual examination, these objectives 
are already prescribed in the curriculum. An evaluator has 
only to take cognizance of those prescribed objectives. The 
first step is, therefore, dependent on the purpose of evaluation 
and the use to which.results are to be put Nevertheless, 


168 А Handbook of Pupil Evaluation 


specification of intended outcomes of learning is a must, 
whosoever undertakes its formulation (2). 

It may be realised that the intended objectives are only hypo- 
thesised and all of them may not be achieved if pitched high. It 
is only after evaluation that we know the validity of these 
logically formulated objectives. Emergent objectives may not 
completely be in agreement with the stated (hypothesised) 
objectives. Emergence of some new objectives in the form of 
concomitant learning cannot be ruled out. 

Anyway, it is these specific objectives which become the 
reference point for validation of evidence that accrues during 
the process of evaluation. 


Step 2. [Appraisal of pre-requisite learning ) 

Since evaluation is treated as integral part of the teaching- 
learning process, the first phase of evaluation is concerned with 
testing for readiness so that it may be judged whether students 
have already acquired the minimum pre-requisite knowledge 
and skills in order to benefit maximally from the instruction 
that precedes evaluation of the instructional impactIn other 
words the objectives specified for the unit can only be achieved 
if the concepts, skills and information basic to the intended 
outcomes of learning are learnt. One cannot teach about 
‘adjective’ without the knowledge of ‘Noun’, multiplication 
without addition and the mechanism of cellular respiration 
without the cell structure. Evaluation at this pre-instructional 
stage is diagnostic in nature to know whether there is adequacy 
in previous learning or not. Developmental teaching cannot 
-be affective without this. Thus depending upon the facilities 
and the time available, one can resort to oral or written testing. 
If there is adequate learning, instruction (developmental 
teaching) can be undertaken. If there is inadequacy in 
learning remedial action is called for. It may take the form of 
reteaching, additional assignments, explaining, discussion, or 
any other form. 

After remediation (if needed) the students are retested for 
their readiness to know if adequate learning has taken place to 
take up developmental teaching; otherwise remedial work 
continues. 


Steps in Evaluative Process 169 


Specified 
Learning outcomes 


Adequate Developmental 


Teaching 


Testing for 
Readiness 


Inadequate 
i | 
! 


A ecu 


Remediation 


In actual practice, a few questions relevant to the intended 
learning, are put to test pupils for concepts and skills regarded 
as essential for learning of new concepts or skills. Such 
questions are put to test their fundamental knowledge and 
understanding. These questions are used only to explore the 
possibility of remedial work, if needed, for effectiveness of 
instruction. It is during developmental teaching that testing in 
the form of questioning, assignments, hometask, pupils’ 
questions and counter questions is truly integrated. The purpose 
of this inbuilt evaluation is to improve students’ learning by 
adapting, adjusting and modifying teaching-learning strategies 
for achieving the desired or intended product of learning. This 
sort of evaluation is formative in nature as its focus is on 
adjustment of ends and means to improve student’s learning. 

Developmental teaching is followed by post-instructional 
testing, the function of which is certification of students’ 
achievement and their classification for other instructional and 
administrative decisions. This type of evaluation is summative 
in nature. Of course, the results of summative evaluation can 
be used for diagnostic and formative evaluation as all the three 
types of evaluation are interdependent and have the continuity 
of a cyclic process. Summative testing, therefore, can be 
undertaken after each unit, but only for feedback and applying 
unit correctives (3). R 


170 Handbook of Pupil Evaluation 


Step 3. Devising situation for testing 

Once the intended outcomes of learning are clearly defined or 
taken cognizance of and the adequacy of pre-requisite learning 
is established; the next step is to collect evidences about pupils’ 
learning. For this, different types of situations will have to be 
devised to obtain the desired evidence. As we have already 
emphasised the unit approach to teaching and testing, these 
evidences will have to be collected in terms of pupils’ attain- 
ment of the intended learning outcomes already formulated for 
the unit) Different types of situations will have to be identified, 
depending upon the types of evidences to be collected. Selection 
of a particular situation.is closely related to the instructional 
objectives of the unit, the nature of content and of course, on 
its practicability. Different situations may be used in different 
contexts and sometimes in combination of twos or more. 
Situations may take the form of certain activities, like, reading 
aloud, projects, assignments, pupils’ products, library, visits, 
practical work, a debate, discussion, demonstration, a test, or a 
quiz etc. Whatever situation may be devised or selected, it 
must yield evidence which is valid and dependable (4). 

Every situation involves some sort of activity which may be 
used for teaching in pursuance of a particular objective. The 
same or similar activity can be used for appraisal of the 
intended learning outcomes also. For example, a simple 
activity such as drawing a diagram can be used to develop 
drawing skills among the students and at the same time it may 


be used as a basis for assessing students’ drawing skills. 
Similarly, many other types of activities can be made the 
basis of assessment of students’ learning. In fact, it is a very 


natural way of assessing students and may be appreciated 
especially at the elementary stage where students can be relieved. 
of the fear of examinations that corrodes their mental health. 
Projects can also be made the basis of an appraisal of students” 
learning. Of course, here better opportunities are provided to 
the evaluators to assess not only students’ intellectual abilities 
but also their skills, attitudes and interests. Outcomes like 
those of initiative, cooperativeness, social adjustment, sense of 
responsibility etc. can be more genuinely evaluated using the 
Project as a situation. Nevertheless, in case of individual 


Steps in Evaluative Process 171 


projects one can judge more precisely the contribution of each 
student in ierms of his attainment on cognitive, affective and 
psychomotor objectives. Likewise, an assignment may also be 
used for assessment of all those cognitive learning outcomes 
for which instruction was provided. Besides, it provides 
indicators about one’s diligence, systematic organisation, 
neatness, originality ctc. Library visits indirectly show one's 
interest in extra reading. An excursion or a field trip organised 
by a Biology teacher provides ample opportunities for the 
teachers to evaluate pupils’ observational skills, collecting 
skills and cooperative attitude, besides their cognitive learning. 

( A testing situation is by far the most acceptable and usable 
form of situation in which a group of students can be examined 
more validly and reliably in the area of cognitive learning. ) 
Howevei,the role of tests in judging students’ interest and 
attitudes cannot be overlooked. The only limitation in this 
case is the lack of technical competence on the part of teachers 
and the difficulty of interpretation of test scores. Such test 
situations may take the form of a written unit test, an oral unit 
test or a practical exercise, based on the objectives of the unit 
formulated during the first stage. 

Pupils’ products which may take the form of a chart, a 
composition, a drawing or an improvised apparatus can also be 
used asa situation for judging students' learning. Depending 
upon the type of pupils’ products, outcomes like intellectual 
abilities, skills or certain aspects of personality can be 
appraised. Discussions, debates and symposia can be used as 
situations for assessment of students’ ability to express, organise, 
argue, analyse, criticise etc. 


Step 4. Obtaining needed evidences ) 
After specifying the intended learning outcomes and choosing 
the relevant situations, the next step is to obtain evidences 


about pupils’ learning. This involves the following sub-stages 
on the lives suggested by TenBrink (5). 


(a) To locate information already available. 
(b) To decide when and how to get information. 


(c) To construct or select the relevant instruments of 


172 Handbook of Pupil Evaluation 


evaluation. 
(d) To obtain information by using various tools and 


techniques. 


(a) To locate information already available, all relevant 
sources should be tapped. Teachers’ records are the most 
usable ones for this purpose. Itcan Ье а student's performance 
on previous unit tests, his grades on school assignments Ог his 
attendance in various curricular or cocurricular activities. 
School results may be used for identification data and for longi- 
tudinal assessments over a period reflecting students' cumulative 
achievement) A councellor's record may be used for collecting 
information] about their mental health, vocational interests, 
attitudes etc. Likewise parents' record if any, can be profitably 
used to get infomation about some personality aspects like 
aggressiveness, emotional adjustment etc. Study habits, work 
habits and health habits are other areas which can be explored 
from such records. Other records like the ones maintained by 
a sports’ in-charge can also be made use of for appraisal. of 
physical health. 

Whatisitthatone should look for in such information? 
The most important is the relevance of information to the 
desired purpose. Accuracy of information should also be 
checked in terms of errors and the estimate made of such errors 
if available, How far the information can be useful in making 
anticipated judgements or decisions? For this cognizance may be 
taken of the appropriateness of the referents used, evidence of 
the estimated ability, the evidence of predictive validity and the 


statement of tlie information obtained. 
( (b) When and how to gather information would depend 


upon the type of evidence required and the use to which this 
information is going to be put. There are four major techniques 


of gathering data viz. testing, observation, inquiry and analysis 
as categorised by TenBrink (1975). Under each of these techni- 
ques various types of instruments can be used. ‘Selection of 
instruments is to be followed by timing of the process of 
gathering information, specification of testing conditions and 
finally the construction or selection of information gathering 


instruments. ` 


Steps in Evaluative Process 173 


{ (c) If the instruments of evaluation are available with the 
teachers or the school counsellor, the only task is to select the 
needed tool and apply it to get the evidence desired. However, 
care has to be taken that the instrument selected must yield 
evidence that is valid in terms of the purpose of the test, besides 
having'a reasonable degree of reliability. Depending upon the 
purpose, greater importance may be given to the validity if 
improvement of achievement is the focus of measurement. If 
however, grading or comparison is the aim then reliability may 
be given precedence over validity of the instrument.(In case 
new tools of evaluation are to be prepared, then all those 
factors that affect the validity or reliability of the measuring 
instrument must be taken into account. )Ensuring reasonable 
reliability and validity is the most essential feature of every tool 
of measurement and forms the scientific basis of collecting 
information about pupils’ growth. 

{ (d) To obtain information, tests or other tools and techni- 
ques are to be administered to collect data. Good timing is a 
prerequisite for obtaining useful information. Are stadents 
ready to cooperate for the test or other techniques you are 
going to apply? /They must be prepared cognitively. They must 
get opportunity to learn from such an attempt. A few unstruc- 
tured questions, testing rote memorisation of factual informa- 
tion, do not really provide any clue to the cvaluator to discri- 
minate between the good and the poor. Any judgement, therc- 
fore, made in terms of class performanee becomes deceptive. 
Besides, students are expected to know what it is that they are 


. being evaluated for? In other words if students are given the 


instructional goals carefully written in observable terms they are 
likely to learn more from such an experience. Students should 
also be emotionally prepared to take the test or undergo any 
other examination. They should view the examination not just 
as device for labelling them as passes or failures but more as a 
device to promote their interest and to improve their achieve- 
ment. This might lead to the development of positive attitudes 
towards testing and other data gathering techniques. 
Teachers themselves should also be sure that they 
for the test administration. It js not uncom 
when a standardised or a new 


are Teady 
mon -to find that 
) ly constructed instrument is io be 
used, teachers sometimes overlook certain instructions, forget 
о 


C 


174 Handbook of Pupil Evaluation 


needed material, ignore time-limits of a sub-test and unifor- 
mity of instructions. It must be ensured that any extra material 
needed in the form of graph, pencils, answer sheets or stop 
watch must be kept handy and ready. 


Step 5. (Analysis and recording of information 

Information obtained from yarious sources must be recorded in 
a meaningful manner so that it could be analysed properly. It 
is assumed that the information gathered is valid, reliable and 
up-to-date. What is now required is how to use this informa- 
tion. Depending upon the use to which it can be put an analysis 
has to be made) Some of the relevant questions that can be 
asked in this connection аге: 


(а) Who obtained information about whom? 

(b) What is the information and when was it obtained? 

(c) How was the information obtained and how far is this 
information valid and reliable? 

(d) In what form (raw score or derived score etc.) is the 
information available and which instructional objectives 
does this cover? 


Information, if gathered by a class teacher on a unit test, is 
more reliable for judging his students’ learning than about his 
health habits—about which his parents can provide more 
accurate and dependable evidence. Similarly, information 
gathered on a unit test two months ago has no relevance to his 
achievement now, on another unit test, from the point of view 
of the use of this information for judgement making. Moreover, 
the validity and reliability of the information depends on the 
relevance of the instrument of evaluation used and the reliability 
of the scores that accrue from it. Once the information 15 
available. it has to be processed into a meaningful pattern so 
that it could be interpreted in a desired manner. 

Information may be in the form of observation notes, test 
scores, quiz scores, percentage of attendance, letter grades ES 
an assignment, anecdotal record on some significant behavioura 

metric data, performance in a co-curricular activity 


aspect, socio 4 2 : sities 
ог any other data collected through various inquiry tecbnig 


Steps in Evaluative Process 175 


like questionnaire, interview rating scales etc. 

Whatever be the data, these must be recorded in a manner 
that lends itself to easy manipulation. For test data, for exam- 
ple, some code or serial number ofthe test, scheduled data, 
test description, testing conditions (context) summary statistics 
and comments, if any, may be provided on a sheet which be- 
comes a part of the file. For observational data, some format 
may be desired to record notes of observation, the context in 
which observation was made, limitations if any and the inter- 
pretation made. For recording of data gathered from inquiry 
techniques, some usable proforma may be developed to note 
the evidence that accrues from rating scales checklists, question- 
naire, interview etc. This should reflect the basis or aspects of ' 
measurement, criteria used, mode of summarisation etc. This 
must be common to the assessment mode used by all teachers. 
Data on analysis of a product like a composition, an improvised 
apparatus, ап album, a chart or a model can be recorded on 
the basis of clear description of what was measured or what 
was expected in the form of a standard or criterion. The rules 
of assigning marks or grades, the basis of comparison of 
performance etc. are also to bè considered for recording · 
purpose. 

Whatever be the mode of recording it must be handy and 
practicable. Unwieldy records are difficult to maintain and use 
фу teachers and are not much useful to students either. A good 
record must be-simple, up-to-date and readable. Its security as 
a confidential device may be maintained. Keeping in view the 
time limitation of the teachers, the procedure and proforma of 
records should be so simplified that it is easy for teachers to 
tabulate, collate, maintain up-to-date and use such records 
efficiently. Further analysis depends on this recorded data. 
Maintenance of records is a management skill which every 
teacher should acquire. 


5.0 Three Modes of Analysis | 
Collection of evidences is followed by analysis. In fact, analysis 


begins with the tabulation of data. The type of analysis depends 
upon the use to which the data are put. Sometimes we are 


176 Handbook of Pupil Evaluation 


interested in grading, sometimes in the classification of students, 
sometimes to promote or certify their achievement.) We may be 
interested more in providing remedial work to the students but 
sometimes we are more interested in judging our own instruc- 
tional efficiency in terms of instructional objectives. Perhaps, 
above all, we are more interested in improving students' learn- 
ing. In such cases, we want to analyse the evidences to identify 
inadequacies in students’ learning, shortcomings in instructional 
efforts or pitfalls їп evaluation itself. Analysis, therefore, may 
be done in terms of an individual or it can be done to 
judge his performance in terms of the class. /It can also be done 
in terms of instructional objectives or any standard of perfor- 
mance may be taken as criterion. 
) 

(а) Self-referenced 

This type of analysis depends on judging the performance of an 
individual in terms of his own abilities and efforts. Since every 
individual learns at his own pace and has a varied educational, 
social and experiential background, his performance cannot and 
need not be judged in terms of the class as a whole. Self- 
referenced analysis is done with reference to one's own perfor- 
mance or progress. \As such it provides better feedback for his 
past achievement arid is a good indicator of his future achieve- 


ment. 


(b) Norm-referenced 

Students’ achievement can also be judged with reference to his 
class in the same section and even with reference to a school, a 
district or a board of school education or the state. Here we 
are more interested in judging the performance of an individual 
against a norm or average performance of the group. This is at 
present the practice in most examinations conducted inter ally 
or externally because the purpose of such examinations is to 
classify students for promotion or to grade them into different 
egories such as first, second or third divisioners) As far as 


the comparison of a studént's perfor- 
as the knowledge 


cat 
learning is concerned, 
mance with the group is not as significant 
about the inadequacies in his learning. Comparison of students' 


performance in terms of a norm ога group is useful when we 


Steps in Evaluative Process d 177 


are interested in the comparison of institutions or the selection 
of students. 


| (c) Criterion-referenced 

Another way of looking at the performance of a student is to 
compare his performance with reference to some pre-determined 
criteria e.g. instructional outcomes. Intended instructional out- 
comes may be visualised in terms of an instructional objective 
of a unit, a particular level of proficiency in a skill or a mini- 
mum mastery-level of achievement.) Unlike norm-referenced 
comparison in which we may be ‘satisfied with four correct 
spellings out of 10, the criterion-referenced approach empha- 
sises that a minimum of 8 or9 out of 10 spellings can be 
considered a desirable level of performance. If we are interested 
in bringing up our students to а particular level of achievement 
to enable them to derive maximum benefit from further instru- 
ction, this approach should be adopted. It envisages analysis of 
evidences in terms of а pre-determined, acceptable level or 
standard of performance. (This approach envisages that every 
individual is supposed to reach almost the same intended level 
of achievement or performance although some may take more 
time than others to reach that level. Thus analysis may be done 
in terms of the individual, with reference to the criterion of 
performance. 


Step 6.( Forming judgements and taking decisions д) 


( Evaluation is a process of making value judgements. Therefore, 


analysis of evidences is followed by interpretation of the evi- 
dences to form judgements. There are various types of judge- 
ments which the teachers form. Grading of students, diagnosing 
their weaknesses, predicting success in the examinations etc. are 
day-to-day judgements that the teachers make. Based on such 
judgements they take decisions regarding promotion, r.medial 
work or changes in instructional strategies. ) 

Interpretation depends on the conversión of scores, grades, 
or any other type of evidences into more meaningful terms so 
that the students and parents can understand it. For this, com- 
parison of evidences in terms of some referent is essential- Such 
а referent may be an instructional objective, à class or the 


178 Handbook of Pupil Evaluation 


individual himself. Accordingly, we may have three bases of 
judgement: self-referenced judgements, norm-referenced judge- 
ments and criterion-referenced judgements, as already mention- 
ed in the previous pages. Self-referenced judgements are based 
on comparison of performance in terms of one’s own rate of 
progress while norm-referenced judgements involve comparison 
of a student's performance in terms of the group, a class, 
district, state or national level. Criterion-referenced judgements 
on the other hand, refer to some sort of criterion which may be 
an instructional objective, a performance standard or a learning 
outcome. These three bases of judgements form one dimension 
of judgement-making. | 

The second dimension of judgement-making concerns the 
type of judgements. For example, an individual's performance 
can be judged in terms of his deviation from the norm, like that 
ofa class performance or school performance. It means the 
extent to which an individual deviates from the average perfor- 
mance of a group to which he belongs. Such judgements which 
emphasise one's: position in terms of deviation are called 
deviation judgements. All our classroom judgements and 
Board results are examples of this type. The other mode of 
forming judgements can be in terms of mastery learning. 
According to this approach, a student is to be judged in terms 
ofthe mastery of what is learnt. Їп such cases it is expected 
that the student acquires mastery beyond 75 or 80% level of the 
intended level of performance or the pre-determined criterion 
of expected learning outcomes (6). Therefore, 8 correct spellings 
out of say, 10, might be considered as a minimum level of 
mastery in one case while the development of 10 out of 10 
component skills needed to use a clinical thermometer efficiently 
may be considered essential for mastery in another case. 

The third dimension refers to the purpose of judgement- 
making. Accordingly, judgements may be made to estimate 
one's present status, position or achievement. It may a predic- 
tive judgement, where the purpose is to make prediction. What 
is needed is the selection of the predictor, the collectism of 
evidences on the predicted variables and the estimate of the 
relationship between two sets of data. Therefore, judgement 
forming is closely related to the referent selected and the mode 


Steps in Evaluative Process : 179 


of interpretation. In day-to-day classroom assessment on 
teachers' assignment and tests, values are assigned in the form 
of grades as A, B, C, D, E, etc. Such judgements are made 
sometimes according to normal curve or as percentages of 
grades, scores, level of achievement or grading on a standard 
(maste ry) judgement or self-referenced judgements. 


61! Making Decisions 


Learning to make judgements fora better position is essential 
for a class teacher because it is on the basis of these judgements 
that decisions ate taken. In fact, decision-making is the ultimate 
aim of evaluation.)Faulty judgements can lead to wrong deci- 
51015. (ЛЕ is, therefore, essential that judgements should be 
verified first and only then decisions arrived at? The accuracy 
of information, may be checked to verify judgements. For this, 
relevance of data gathering instruments and their reliability are 
two important aspects to be checked. Judgement-making 
becomes useless if the error of measurement is appreciable. 
Further empirical evidences may be collected in such cases or 
be prepared for review of a decision. 

Decision-making is a complicated process and requires a ' 
critical approach to avoid situations which later on demand 
reversal of decisions made earlier. Faulty decisions may also be 
due ќо imprecise statement of instructional objectives, taking 
‘cognizance of only few alternative solutions, failure to visvalise 
each alternative possible use of invalid judgements and over- 3 
looking professional ethics. Therefore, whenever decisions are 
to be taken about grading, certification, remediation, change in 
instructional stretagy, etc. teachers should be cognizant about the 


following steps: 


(a) Specify your objective clearly in terms of expected or desired 
learning outcomes. 
(b) Identify all possible alternatives and probzble outcomes of 
each alternative. 
(c) Visualise the likely consequence of each of the courses of 
action visualised, along with probable outcomes. 
fd) С hoose the best alternative by giving appropriate value to 


180 Handbook of Pupil Evaluation: 


different alternatives and weightages to probable conse- 
quences. 


Judgement therefore, is a very responsible job апа requires 
critical acumen besides the technical competence required to 
Process the data in terms of pre-determined purposes. Faulty 
judgements are the result of unreliable and invalid evidences, 
or due to incognizazce of the purposes for which the judgemerts 
are made. Like-wise wrong judgements lead to wrong decisions; 
thereby wasting all efforts used in collecting relevant evidences: 
about pupils’ learning. 


(step 7. Summarising and reporting — ) 
Once the judgements are formed and decisions are made, two 
Possibilities emerge. If everything is alright, sunimarisation and 
reporting follows.) If there are certain inadequacies in the 
achievement of students then diagnosis and remediation may be 
taken up before summarisation. ( Summarisation involves the 
, types of evaluation reports made to communicate to its users.) 
4 Learning the technique of summarising the results and applica- 
tion of specific rules to communicate the results to those who: 
are interested in the improvement of pupils’ learning are, there- 
fore, essential features of reporting. 

Single nurnerical scores comprising the subject-wise data: 
which is in vogue, smacks of an authoritarian approach and а. 
bureaucratic convenience of recording, filing and storing records: 
in the past. Complete exclusion of data gathering devices other 
than testing was, perhaps the main reason in the past why: 
teachers’ assessment were outweighed by his Majesty's 
Inspectors. Now with the use of a variety of evidences of 
different aspects of students’ growth, summarisation and record- 
ing have become more cumbersome and technical. н 

\ Properly summarised evaluation data provides the basis for. 
making further judgements and decisions by administrators, 
parents, teachers and other users.)Good summaries are good 
indicators for discovering the cause of unplanned instructional 
efforts at a particular time. Moreover, they are really used for 
judgements made or decisions taken. Р 

The first requisite for preparing а good summary 15 te: 


р 


‘Steps in Evaluative Process 181 


identify the clientele. Is it for the parents, teachers, students, 
parent-teacher meetings or for some other purpose? The 
summary should indeed be made by the teachers and for the 
teachers as these are to be used, by and large, by them to 
improve students' learning and their own teaching. What is the 
purpose of the summary or for what summary is to be used, 
determines the types of report to be prepared. A letter, a 
report card, a diagnostic report, a conference, an action report 
etc., are different techniques for summarisation. The mode of 
summarisation depends on the use to which the report is to be 
put. It may concern placement in a remedial class, shortage of 
attendance, an award for the best student, the timing of comple- 
tion or the beginning of a unit, etc. 

( The next step in reporting is the listing of the decisions 
made. Only the critical decisions should be included, in order, 
from the most recent one to the remotest one. Each decision 
conveyed must be supported by the judgements of the teachers 
who influenced those decisions) Lastly, mention must also be 
made of the information that led to a particular judgement to 
indicate the basis of judgement-making. Thus summarisation 
includes critical decisions based on judgements formed and 
‘supported by information that led to those decisions. 

Summarisation is a precursor to reporting. A report means 
communicating the summarised evaluation data to the users. It 
may be formal or informal, written or oral, pre-planned or 
emergent. A very clear, unambiguous and easy-to-understand 
report only can be appreciated by its users. Tt must include all 
the minimum information essential to the intended users. If a 
statement of what is being reported clearly indicates whether it 
is about the behaviour, class performance or a parents’ meeting 
it becomes more meaningful Guidelines must be provided for 
interpretation of the information symbols. Each report should 
be meaningful and actionable for the recipient. For example to 
say, “Your ward is not getting on well", is meaningless for the 
parents. But to say, “He is weak in English spellings when 
«compared to his classmates", 15 more meaningful because the 
parents can take some action in the form of giving dictation or 
spelling tests at home. Reporting is, therefore, the service- 
component of the evaluation process on the basis of which a 


182 Handbook of Pupil Evaluatiow 


relationship among teachers, students and parents is established. 


' 
7.1. Diagnosis and Correctives 


[ ^s pointed out earlier, summarisation and reporting follow 2 
' decision strategy, if everything goes well in terms of students" 
learning. However, if certain inadequacies in students’ learning 
соте to the surface than further diagnosis, followed by 
remedial work, precedes sunimarisation and reporting. This. 
step indeed refers to the pedagogical aspect rather than the 
' measurement aspect of evaluation. Diagnosis may be focussed 
on the teaching-learning process or more often on the pupils.) 
Diagnosis followed by various types of correctives should in fact 
be a regular feature of any formative tests. Diagnosis is, there- 
fore, of necessity imposed by the judgements from the evalua- 
tion data. Diagnosis, indeed, should be considered a sacred 
service of the teaching-learning process (7) and thus becomes 
integral to the instruction. Various types of correctives can be 
applied, depending upon the nature of difficulties diagnosed. 
Correctives no doubt relate to the padagogical aspect as their 
focus is on the improvement of instruction vis-a-vis students" 
learning. There are various types of correctives like alternative 
textbooks, workbooks, flash cards, re-teaching, audiovisual: 
material, programme instructions, tutoring etc., which can be 
applied (8). Such correctives can be individual-based or group- 
based depending upon the nature and magnitude of errors 
observed. Some of these correctives can be used either for the 


whole class, for a smaller group ог for an individual. These 
ntal correctives, if a learner is him- 


correctives can be involveme 
study session. 


self involved in the learning process e.g. self- с 
Sometimes presentational correctives can be used which are 
good for a group-based plan of instruction. They focus prima- 
rily on presenting the new material like the re-teaching of a 
niodule. The choice of а corrective is, therefore, of primary 
importance for improving students’ learning. Further details of 
diagnosing pupils' weaknesses and the related remedial measures 
will be discussed more thorougly in a separate chapter on 


diagnosis and remediation. 


=" 


Steps in Evaluative Process 183 


Step 8. Feedback and correctives) 

Since the focus of evaluation is on the improvement of students" 
learning and instruction, the evaluation process cannot be con- 
sidered complete unless the evidence is made use of for this 
purpose. Feedback mechanism provides loops to each and 
every component of the teaching-learning process. As such 
feedback can be made use of in the following manner (9): 


(a) To judge the efficacy of the objectives 

Test results can be examined in terms of the objectives set up. 
Constant poor results in relation to a particular objectives, say 
the application objective, may indicate that the objective may 
perhaps be pitched too high for the students and nceds modifi- 
cation. Other things remaining the same, if most of the 
students do exceptionally well on a test, it might reflect the 
lower level of the objec.ive which may be suitable for a lower 
grade.)Similarly, if the results are analysed in terms of specific 
learning outcomes (behavioural outcomes) implied by each 
objective, it can also serve as a basis for modification, deletion 
or addition of certain outcomes of learning depending upon the 
level of students" performance on each. 


(b) To improve curriculum | 

Analysis can also throw light on the suitability or otherwise of 
the curriculum as well. Too difficult or too easy areas of con- 

tent may lead to modification or even their total exclusion 

from tne course. \Over-emphasis ог under-emphasis on a parti- 

cular topic, misplacement of a concept, sequencing and 

reorganisation of certain units, proper grade placement of 

certain concepts ctc. may require further investigation. All} 
these improvements relate to curriculum redevelopment. 


(c) To promote learning ) 

Assessment of pupils’ performance helps to understand the 

level and learning of various concepts. Immediate feedback of 

results to students, teachers, and if needed to the parents as 

well, also goes a long way in improvement of their achievement 

and reinforcement of what is learnt. Moreover, knowledge of 

inadequacies provides them direction for further improvement. ) 
The most significant contribution that results from such a 


| 


184 Handbook of Pupil Evaluation 


feedback is the development of a positive self-concept among 
the pupils. Continuous feedback of the pupils’ adequacies acts 
as a great motivating force for further learning. 


( (d) To improve instruction 
Results of measurement indicate the emphases laid їп teaching. 
The quality of teaching is reflected indirectly in pupils learning. 
"Therefore, on the basis of students" performance, the teacher 
can rectify his past mistake, and prevent their recurrence in 
future. The performance level of the students on various ins- 
tructional objectives, indirectly shows success, failure or limita- 
tion of methods and materials üsed in instruction.) Likewise, 
contentwise analysis of performance may also indicate whether 
a unit or a module has been learnt well, failing which, it might 
require re-planning and re-organisation of activities for future 
instruction. 

(e) To provide guidance ) 

\ Guidance without measurement loses it scientific character 
while measurement without guidance loses its purpose toa 
great extent. Thus, results indicate the areas of successes and 
failures on the basis of which future success can be predicted. 
Poor performers are identified and their weaknesses are located; 
so that necessary guidance could be provided. Study habits are 
conditioned by methods of examinations. On the basis of 
scores, students can be guided in the selection of right courses 
to help them make right vocational choices later. Provision for 
individualised instruction for certain students and enrichment 
programmes for other individuals can also be planned and 


organised. 


(f) To promote research ) | | 
Action research is the means of finding out the truth regarding 


questions of practical importance concerning the job and not 
research dealing with theoretical questions or eternal truths. 
Advantage can be taken of the analysed data by using them to 
solve certain problems like the impact of teaching methods, 
| in a question paper,(types of options, 


choices of questions} i p 
allotment of marks for individual questions etc. JProblems like 


Steps in Evaluative Process 185 


these can be solved by undertaking classroom research and 
using the data for improvement of learning and instruction. 
Once the feedback is done, the process of evaluation with 
respect to a unit of teaching comes to an end. Then follows the 
next unit of teaching. 

In this integrated model of evolution, one may appreciate 
that evaluation of students ends neither end with the collection 
of measurement data nor with analysis and judgement-making 
functions but goes beyond that to feedback the results to all 
the components of the teaching-learning process. ]t is this use 
of results of evaluation that helps the teacher to reformulate 
instructional objectives, provides a basis for reorganisation of 
learning experiences, identifies teaching-learning efficiencies 
and even judges the effectiveness of the evaluation programme 
itself. One thing which needs to be appreciated is that in for- 
mative evaluation, an attempt is made to integrate teaching, 
learning and testing. Teaching provides the validating referent 
while testing becomes an inseparable part of the teaching-learn- 
ing process. Therefore, the use of evaluation as a device for 
furtherence of instructional efficiency as well as for improvement 
of students’ learning should be considered as the ultimate aim 


UNIT-i 


SPECIFY OUTCOMES OF LEARNING k= 


^ 


у. 


TE 


Test for 
readiness 


Objective 
based teaching 


Inadequacy l 
1— Foliowed Гр V 
Remediati s Further Obtain 
emediation by Diagnosis L cvidences 
~ UN 


Devise 
Test situation 


If necessary 


A 


i 


Summarisc 
and Report 


Take decisions 
Form Judgements 


T 


Analyse 


FEED-BACK FOR NEXT UNIT 


Y 
UNIT-2 


and Record 


186 Handbook of Pupil Evaluation 


of teacher-evaluator. 

The relationship among the various steps of evaluation dis- 
cussed above can be shown in the form of an integrated model 
of formative evaluation as shown on page 185 (10). 


уч ——э инн == 


СНАРТЕК УП 


TOOLS AND TECHNIQUES OF EVALUATION 


| 1. Introduction ) 


Information gathering is the first step of evaluating students. 
It requires the use of various data gathering devices. These 
include different types of tools and techniques which are applied 
to collect various types of evidences. A number -of tools are 
employed relating to various techniques like those of testing, 
observation, inquiry etc. What is important is the selection of 
the right device for collecting the needed information and the 
quality of data that accrue from the application of those tools 
and techniques. Attempt will be-made to give here brief des- ` 
cription of these devices and their role in data gathering. Itis 
expected that readers after going through this chapter will be 


able to 


(a) describe the nature, scope and limitations of various tools. 
and techniques, Y 

(b) classify the various tools under different techniques of 
evaluation, | 

(c) identify the relationship between the types of evidences 
possible through application of techniques like testing, 
observation, inquiry and analysis, . 

(d) select and apply the appropriate tools and techniques to. 
collect the needed evidences about pupils’ learning. 


188 j Handbook of Pupil Evaluation 
| 2. Testing Techniques 


Ncque refers to the mode of collecting evidences about 
pupils’ growth or achievement whereas a tool is an implement 
ог instrument that helps us to extend our observation by bring- 
ing more precision in collection of information or data. Thus a 
technique involves the use of certain tools for data collection 
or summarisation of data. The four basic techniques of evalua- 
tion are: testing, observation, inquiry and analysis. Each of 
these techniques when applied involves the use of various tools. 
The following flow chart would make the relationship between 
the two more clear: 


Evaluation &— — — —» Testing «&—  — — —5» Question paper 


| | 


(Process) (Technique) (Тоо!) 


There are two types of evaluation devices, namely, the data 
gathering and data summarisation. For data collection these 
may be testing techniques or non-testing techniques like obser- 
vation, inquiry and analysis. With each technique a number of 
tools are associated. (Some of the more commonly usable tools 
are categorised and listed under of the testing and non-testing 
techniques in the form of a flow chart (see annexure) followed 
by a discussion of the more commonly used technique. 


24 (Written Examinations ) 


e written, oral or practical. It is the 
written tests which have been considered more reliable for 
measuring learning outcomes in the cognitive domain as also 
for the affecting domain: n the cognitive domain, written 
tests are in vogue in all subjects. Their prominent use in evalua- 

` ting students at all levels is due to their better reliability than 
the other two types. Inspite of the fact that the present written 


examinations suffer from many defects, they are still the most 
acceptable device for judging students’ achievement and even 


fTeacher-made tests could b 


Tools and Techniques of Evaluation - 189: 


for selection of students for various admission tests) This does 
not, however, mean that written tests as they are now construct- 
ed, administered, scored and used are a quite reliable and valid 
measure of students' achievement. They need to be planned 
more scientifically, developed more technically and interpreted 
more cautiously. 

Атһе written tests more commonly used are the achievement 
tests but other varieties like diagnostic tests, criterion-referenced 
tests and selection tests are also in use besides attitudes, scales, 
interest inventories etc. For details of criterion-referenced tests 
and diagnostic tests separate discussion is required. For plan- 
ning and development of achievement tests, a more detailed 
discussion is warranted as teachers, paper-setters and evalua- 
tors are all concerned with the usc of such tests. As such the 
next chapter is wholly devoted to the planning and development 
of achievement tests. Thus more attention is given here to: 
oral testing and non-testing techniques viz. observation, inquiry 
and analysis. 


( 2.2. Oral Examinations) 


©Oral Examinations are perhaps the oldest form of evaluating. 
students prevalent in most countries till the beginning of the 
18th Century. Till the invention of the printing press, the oral 
word used to be the only means of communication as also the 
only medium of testing students. ^ 
Although in countries like the USSR, oral examinations 
have still a permanent place yet the fact remains that all over: 
the world written examinations are so much in vogue that they 
have almost relegated oral testing to the background. One 
major reason for it is the ease of construction, administration, 
scoring and interpretation of written examinations and the 
possibility of maintaining better validity and reliability of 
expected responses as compared to oral examinations. (1) 
Nevertheless oral examinations have their own place and 
cannot be replaced by written examinations. Oral examinations 
demand oral evidence?) although questioning can be oral or 
written. Of course, recording of evidence has to be in writing 
except in cases where the group of examiners is extremely 


190 Handbook of Pupil Evaluation 


small. Oral examinations may be classified as under: 


Oral examinations 


Language based subjects Continent Subjects 


English, Hindi{Mother tongue) (Science, Maths, Social Studies) 


v 
Mee PAA External Internal 
‘| Individuatised Individualised 
1 OR да 
Group based Group based 
V w 
‘Single 5 7 UN Single 
examiner сха examiner 


(i) Purpose 

| The purpose of oral examinations is fourfold: / 

(а) To test oral skills like listening, speaking, reading and 
expression (oral) which cannot be tested through written 
examinations. 

(b) To confirm and probe further the extent of evidence gather- 
ed through written devices whenever desired, as in a 
project, a unit test or a viva voce. 

(c) To judge the extent to which such skills as are warranted 
by the nature of the subject, are developed,|like argumen- 

Г tation in history or observational skills in biology. 

(d) To make a quick oral review for informal assessment of 

— what the pupils have learnt and what their deficiencies are. ) 


Thus an oral examination may be used to test content as 
well as manner. Its advantage is that oral skills, work experi- 
ence, project work, functional understanding about practical 
skills etc. can be validated more profitably through oral testing 
which act as an external moderation device. For younger 
children, it is an indispensable testing device which provides on- 
the-spot feedback about students’ learning and instructional 
strategies, thereby providing a basis for remediation. Likewise, 
Students weak in writlen expression can be assessed better 


Mc e LIMIT 


Tools and Techniques of Evaluation 191 


through oral devices. 
In the fifth conference of the Chairmen and Secretaries (2) of 


the Boards of Secondary Education, held at Azad Bhawan, 
New Delhi, in 1964, the then Education Minister, Sh. M.C. 
Chagla, stressed the need for developing oral expression which 
is very essential in day-to-day living. This ability is desired in 
every walk of life and every interview one faces for employ- 
ment. Despite realisation of the need for the development of 
oral skills, the fact remains that unless these skills are evaluated 
in external examinations they are not going to attract the needed 
attention of teachers in developing these skills. Besides, the 
subjectivity in assessment, the need for greater number of 
examiners and their specialised skills in testing say, pronuncia- 
tion; inter-examiner variance, its time-consuming nature, the 
difficulty of the objective recording of evidences, ensuring 
equivalance of questions, subjective interpretation etc. are a 
few among the many more problems and difficulties in using 
oral examinations at the external level. However, as an informal 
assessment device, oral questions in all subject areas would 
continue to be taken advantage of by class teachers for 
diagnosis and remedial programmes. 


(ii) Improving the validity and reliability of oral examinations 
Since by and large oral examinations are to be used in the field 
of languages, it is necessary that for systematic evaluation the 
following procedure may be tried: 


(a) Identify the objectives (listening-speaking-reading, expres- 
sion) 
(b) Delineate the aspects of evaluation (say, oral expression) 
In oral expression, aspects like language and content 
elements may be identified. 
(c) Define oral expression as under: 
The pupil 
(i) uses appropriate words in the right context. 
(ii) uses the correct sentence pattern. 
(iii) pronounces words correctly. 
(iv) modulates his voice according to the situations. 
(v) speaks at a reasonable speed. 


194 Handbook of Pupil Evaluation 
' 3. Observational Techniques 


Í Next to testing the most commonly used technique is the 

teacher's observation. Observation is specific, planned, 
systematic and recorded by an expert as contrasted with casual 
seeing. It is a process in which the observer listens, looks at or 
notices the significant elements of a product or performance of 
a student which is aimed at collecting information about а per- 
son's cognitive, affective or psychomotor behaviour. Certain 
skills like oral reading, listening, speaking, dancing, singing etc. 
cannot be measured through paper-and-pencil tests but can be 
measured through the observational technique. Other examples 
where this technique can be used are attitudes, interests, habits, 
leadership quality, social adjustment etc. Such skills which are 
either themselves observable like singing, dancing and speaking 
or yield observable products like those of painting, writing or 
drawing lend themselves better to the use of the technique of 
-observation. It is an excellent technique to obtain evidence 
about the typical behaviour of students as they take а test, play 
or behave ina situation. Observation is one of the most con- 
venient techniques which can be used profitably, provided it is 
properly planned with a specific purpose, systematically carried 
out, carefully focussed, thoroughly recorded and cautiously 
interpretted by an expert. (3) 

Though this technique is quite subjective in nature, yet 
the validity and reliability of this technique can be improved if 
care is taken in the construction and use of the instruments of 
Observation.) Frequent observations at regular intervals by the 
same observer or independent observations by more than one 
observer at the same time can definitely reduce subjectivity in 
the assessment of typical behaviours. Moreover, the barrier 
between the observer and the observed can be removed by mak- 
ing observations in a natural setting, under normal conditions, 
without making the subject of observation conscious that he is 
being observed. 


3.1. Unstruments of Observations У 


[Бе recording of information collected through observation, а 


| 


\ 


Tools and Techniques of Evaluation 195 


number of devices can be used. The major instruments 
vation are checklists, rating scales, rankings, score 
Scaled specimens. jor course tools like, the lense, th 
scope, the camera, the stop watch, the thermometer etc. are use 
to aid observation. 


| 


(i) Checklists 

It is one of the simplest devices, consisting of a prepared list of 
items which provide information as to whether or not a given 
characteristic is present.\It is just like a laundry list, the pre- 
sence or absence of which can be indicated or recorded in the 
form of 'Yes' or *No'. Half-way responses are not admissible in 
such a list. (The statements are given in specific and functional 
terms. While preparing the checklist we have to analyse the job, 
the task, the skills etc. into various components which should 
be included in the lists Such a checklist facilitates and systema- 
tises the recording of/observations. 


An example of a checklist on study skill 

Can I read silently? 

Can I consult the dictionary? 

Can I read at a reasonable speed? 

Can I relate ideas in the para? 

Can I grasp the- central idea of the para? etc. etc... 


fn d Qi 


Students are asked to tick-mark the statements which they 
think they can answer in ‘Yes’ and cross mark those which they 
would like to respond in ‘No’. The analysis indicates the ade- 
quacies and inadequacies in study skills and provides a basis 
for teachers and students for sélf-evaluation. 


T s ) 
Ki Rating scales’ Y 
The rating scale involves a qualitative description of the limited 


~ number of traits of a person or aspects of a thing in contrast to 


a checklist which shows merely the presence or absence ofa 
trait. Important aspects are selected and the observer's atten- 
tion is directed to those specific aspects. This instrument 
ensures more accuracy and objectivity of judgements to be made 
by the observer. We can use this tool for observing the behavi- 


196 Handbook of Pupil Evaluatiorz 


our, performance or the end products of his performance. 
We can have 3-point, 5-point or 7-point rating scales, 
. depending upon the refinement and practicability. }(4) In a 
classroom situation a 3 point or 5 point rating scale is more 
commonly used and is more usable. Such scales can be: 


(a) descriptive, when a person is rated by descriptive phrase 
like, say, excellent, good, average, fair, poor (5 point) 

(b) numerical, when a numeralis assigned to each descriptive 
phrase 4, 3, 2, 1,0 (5 point, corresponding to terms like 
always, frequently, occasionally, rarely, never 

(c) graphical, in which case descriptive phrases are represented 
on a line at various points in the form of a continuum, 
say, ranging fiom 0 to 100. The student is to check mark 
on that line, at the appropriate place, as shown here. 


20 40 60 80 
0 | | | | 100 


The difficulty of defining a trait or a characteristic to be 
evaluated, the halo effect of the carrying effect of a judgement 
from one trait to the other, tendency of raters to be generous 
in rating, are some of the glaring limitations of rating scales. 


/ 

( (iii) Anecdotal record 

( An anecdotal record is a verbal snap shot of a single but signi- 
ficant event recorded in its setting to describe observations 
made by the observer (5). These descriptions are a factual record 
and not interpretations of what happened. It is a ‘time lapse 
recording pae like a single framework camera. $uch 
records are useful for collecting information about the Social 
adjustments of students оп a variety of behaviours which are 
typicall A teacher can keep on recording such information in 
these anecdotal record cards and can make use of this infor- 
mation whenever a students’ record of typical behaviour is 
called Ғот\ Such records ате to be interpreted very cautiously, 
especially Keeping in view the context or setting in which a 
particular event or behaviour occurred. They form the recorded 
behaviours of the observer for use in forming judgements about 
Students! behaviours. 


Tools and Techniques of Evaluation 197 
\ 


(iv) Rankings 

Itisa crude observational tool which is used simply to rank 
students on the basis of impressions and is therefore quite 
subjective. The students or the objects of measurements are 
marked on the basis of the degree to which they possess or 
do not possess particular characteristics. Ranking may be done 


on the basis of: 


(a) grouping, where predetermined number of groups are 
formed e.g. a product may be ranked into such groups as 
‘average’, ‘above average’ and ‘below average’, ES 

(b) the paired-comparison method, in which case a student is 
compared to every other student and the rater makes jud- 
gement as to which of the two is better. This technique can 
be used for 'ranking individuals, the objects and the 
product. It is rough but easy to use technique. Its disad- 
vantage is thatthe significance of a rank is dependent on 
the size and nature of the group. For example, a rank 
‘third’ out of a group of 10 is not the same as that out of a 
class of 40 students. Likewise a 'third' rank out of a group 
of 25 gifted children and a ‘third’ rank in a group of 25 
below average students carries altogether а different conno- 
tation. This technique, being of limited use, should be used 
only when a quick and rough estimate is needed in which 
a relatively small group of individuals or objects are 


involved. 


\ (v) Score cards) 
It provides for appraisal of a larger number of aspects or traits 

—and carries the advantages of both the checklist and rating 
scales. Like the checklists, different aspects of assessments are 
identified and listed. Each aspect is weighted by giving a pre- 
determined point value like the rating scale. A numerical value 
is given to the various rating points, say 4, 3, 2, 1, 0 ү point 
rating scale). When any aspect is to be rated, say, review 
exercises in a textbook, the weighted score for each aspect as 
well as the total weightage can be used to evaluate the object 
observed (6): 


198 Handbook of Pupil Evaluation 


Example: A review exercise of a textbook, containing 10 
questions 


Aspects Weightage Ratings (0—4) Aspect-wise weigh- 
ted scorc 


(a) Coverage of 


objectives 25 
(b) Coverage of 

content 25 
` (c) Preciseness of 

language 10 


(d) Clarity of scope 15 
(e) Appropriateness 

of directional 

words 15 
(f) Difficulty level 10 


100 Total weighted score =215 


a Se eee RR 


\ Score cards are normally designed for evaluating communi- 
ties, schools, textbooks, socio-economic status of families) As 
with the rating scale, the’ limitation of score cards is the 
difficulty of choosing aspects to be appraised, identifying and 

. quantifying them. Moreover, the fact that the whole is greater 
than its parts, is also a point for consideration. 


| 4. Inquiry Techniques P. 


~Next to testing and observation, the most usable technique is 
the instrument of inquiry which help the teacher to collect 
information about an individual's perception of himself or of 
others in a more objective and systematic manner. The 
questionnaire schedule, socio-metric devices, projective techni- 
ques, attitudes scales and inventories are the more commonly 
used devices. — 
қ \ 
(i) Questionnaire 
А questionnaire is a set of written questions listed іп а manner 
| that can be read and responded to by teachers, students or other 


Tools and Techniques of Evaluation — 199 


respondents when mailed. If administered in person, the 
information gathered through a face-to-face interview it is 
called a Schedule. (7) A questionnaire is usually duplicated and 
mailed to a large number of respondents to be filled up and 
returned. Its focus is on obtaining information about the 
opinion and attitudes of individuals or even what an individual 
did or might do in a particular situation. A questionnaire may 
be the ‘closed form’ or the ‘open form’ type. In the former 
case, the respondent is asked to mark 'Yes' or 'No' or to 
check an item, while in the latter we call for a free response 
in the language of the respondent and thus provide for greater 
depth of responses; Most questionnaires include both forms 
as indicated below by the three questions to illustrate the three 
forms: 


1. Which of the two vocational courses, medical and engineer- 


ing do you prefer? (Closed form) 
2. Why did you choose a medical course in preference to an 
enginecring course? (Open form) 


3. Which of the two courses of studies, medicine or engineer- 
ing do you like more? Give reason for your choice. 
(Combination of the two) 


Questionnaire arc useful tools to collect opinion from 
various types of respondents, in a comprehensive manner, 
involving less expenditure.) However, the disadvantage is that 
what the students perceive, say, about the relevance of a 
syllabus, may not be what actually is. They yield perception 
of what is and not necessarily what actually is. Nevertheless, 
the teacher can change the perceptions of the students by chang- 
ing his methodology of teaching after analysing their percep- 
tions as evidenced in their responses to the questionnaire. 
Another limitation is that respondents usually do not fill the 
questionnaire seriously and also try to show themselves better 
than they actually are or what they feel about the aspects 
asked for in a questionnaire. 

| An inventory is a special kind of questionnaire used for 
getting self-reports and is very highly structured. It provides 
a checklist of behaviours in the form of likes,. dislikes, habits, 


200 Handbook of Pupil Evaluation 


opinions etc. An attitude scale is another variety of question- 
naire which represents a combination of the rating scale and the 
self-report questionnaire?) The respondent is asked to match 
his ‘feelings’ with those described in the scale that is usually 
given in the form of a continuum ranging from, "strongly 
agree,” to "strongly disagree". The respondent reads the 
statement and tick marks the one to indicate if he strongly 
agrees, is undecided, disagrees or strongly disagrees (Commonly 
used form). 


\Gi) Interview 
An interview is a, face-to-face conversation with a purpose 
"which can be informal and unstructured or formal and highly 
Structured. In a way, it is an oral questionnaire in which the 
interviewee gives the needed information verbally in a face to 
face relationship. A good interview calls for proper scheduling, 
à natural setting, adequate planning of questions, establishing 
rapport with the interviewee, lonely conduct of interview, 
systematic recording of responses and cautious interpretation 
of responses) An interview provides the teacher with informa- 
tion about pios attitudes, opinions, self-perception and 
typical behaviour patterns of students when conducted in a 
usual гош ве of interaction between the teacher and his 
students. An interview helps the teacher to kecp track of 
changing perceptions, attitudes, personal relationships etc. 

i» technique is useful while dealing with children, 
illiterates and those with language difficulties besides those 
students who have personal and social adjustment problems. \A 
deep penetration into the reasons for actions, feelings and 
attitudes is possible with a skillful interviewer to motivate the 
learners through this techniques. However, this technique is 
time consuming and interviewer-biased and needs expertness to 
ensure objectivity, sensitivity and insight on the part of inter- 
viewers. The unique advantage however, of this technique, is 
that teacher can obtain effective information of a highly personal 
nature about students. 


Vii) Sociometric instruments > . Р 
( 5осіотегу is а technique of describing social relationships 


Tools and Techniques of Evaluation 201 


among individuals in a group. Sociometric instruments are 
designed to collect evidence about the social acceptance of 
individuals within a group as also about perceived relationships 
existing within that group. Children can be asked to nominate 
a child or give names of. three or so, in order of preference, 
concerning say, invitation for an excursion, a monitor for the 
class, the captain of the hockey team, a desk mate, project- 
group etc.) It is a peer rating, rather than rating by teachers 
or superiors. Moreno (8) launched sociometry in 1934. Diffe- 
rent types of devices are usable. 


(a) Guess-who technique (Hartshorne and May, 1928). 
Children are asked to name an individual who fits a certain 
description. An instrument describes a number of individuals 
and the respondents (students) are to guess who is being 
described. It may be a loner, a leader, class clown, the class 
brain, the trouble maker, compromiser. Statements like the 
following may be asked followed by the question, "guess-who"? 


This boy is always worried. 

This boy is always picking quarrels with others. 

This boy is never prepared to work in a group project. 
This boy likes to be left alone all the time. 


Repo 


(b) Social distance scale (E.S. Bogardus). This scale attempts to 
measure the degree to which an individual or a group of 
individuals is rejected, or accepted by another individual or a 
group. Various situations are scaled and score values are 
established which range from complete acceptance to com- 
plete rejection. An individual is asked to check mark his 
position by selecting one of the points on the scale. For 
example, a student in a classroom situation can choose one of 
the following statements do check his position: 


1. I would like to have him as my best friend. 


‘ (Complete acceptance) 
2. I would not mind having him as my class monitor. 


r (Partial acceptance) 
3. I wish he were not in my class. (Rejection) 


“~ 


202 Handbook of Pupil Evaluation 


The scale can be of 5 or 7 points more precise measure of 
acceptance or rejection. 


(c) Placement devices. To discover perceived relations, a situation 
is described pictorially or verbally and the individuals are asked 
to place the members of the group in different positions which 
are described. For example a map or diagram may be given, 
depicting a playground and the students may be asked to put 
the names of their class fellows in different positions indicating. 
where they would most likely be found during recess. 

Although itis difficult to interpret socio-metric data, yet 
the information gathered is useful for making decisions about 
grouping of students for various activities. The use of such 
devices at regular intervals helps a lot in making self-referenced 
judgements about students’ personal social adjustments. 

) 

(iv) Projective techniques 

Projective devices are used to enable a subject to project his 
inner feelings, needs, values or wishes to an external object. 
The subject reveals himself unconsciously while reacting to the 
external objects. They are designed to find out about persona- 
lity and social adjustment characteristics of individuals.) The 
individual is shown some pictures and told to describe what he 
sees and his responses are categorised and analysed. When. 
ambiguous stimuli are given, it is assumed that the individual 
projects his biases, feelings and idiosyncracies. Since these 
instruments are highly unrealisable, they must be used with the 
utmost care and caution. 

Projection may be sought through: 


(a) association in which the individual is to report what he 
sees, thinks or feels when he is presented a picture, an. 
inkblot, a phrase or word. The Rorschach inkblot test and 
the Thematic Apperception Test are familiar examples. (9) 

(b) completion, in which the respondent is asked to complete 
a sentence or a task e.g. My greatest ambition in life is.... 

(c) role playing, in which case the subject is asked to act or 
improvise a situation according to the role assigned to him. 
Frustration, insecurity, hostility, prejudice etc. can be: 


7 


Tools and Techniques of Evaluation 203 


observed by this technique. 

(d) Creative and constructive work whereby the subject is asked 
to model clay’ paint, play with toys or write imaginative 
stories on the given topic. Deep-seated feelings may be 
inferred by noting the choice of colour, form, words, 
orderliness, evidence of tension etc. 


5. Instruments of Analysis 


Analysis can be used independently or along with an observa- 
tional technique. Content-analysis can be used to analyse the 
students’ work as he learns. Written or spoken communication 
can be analysed for the presence or absence of certain characte- 
ristics say, like spelled words or a creatively used. verb. These 
are counted and compared with some standard to make a 
judgement about the quality of communication. The most 
important instrument of analysis is the analysis of projects and 
assignments which specify the important characteristics in 
advance. Three types of projects and assignments are signifi- 
cant from the point of view of analysis. TenBrink gives the 
following three types of assignments (10): 


(i) Acquisition assignments 

These assignments are prepared to help students to acquire new 
information or skills. Following examples are relevant in this. 
regard: 


(i) Study unit 5 and identify the new terms introduced for the 
first time OR study the spelling words for the Monday 
test. 

(ii) Memorise your table of multiplication for 6 x 10. 

(iii) Read chapter 3 to know different modifications of root. 
For evaluative information many variables that effect 
learning ate to be controlled otherwise it would be 
difficult for the teacher to find out how information is 
acquired. и 


Such assignments help to know what new knowledge and 
skills have been learnt; how fast did students learn and what 
kinds of errors they make during acquisition. 


' 


204 . Handbook of Pupil Evaluation 


(ii) Review assignments 

In order to reinforce the learned information and skills, review 
is essential Recurrent use of concepts in science textbooks 
or practice in using newly acquired words, reading skills etc. 
help the learner to retain that information and skills. Well 
designed review exercises enable the teacher to know the 
process a student uses for review as also about the level of 
original acquisition. Analysis of review assignments help the 
teacher to know about unexpected outcomes also. Ask students 
to show sums in Mathematics or a composition in English. 


(iii) Transfer exercise 

Knowledge and skills acquired in a subject help a student to 
know more about himself even in other situations in real life. 
Transfer exercises should enable students to transfer their 
learning (knowledge and skills newly acquired) to new learning 
situations, What are the conditions under which transfer is 
facilitated or inhibited? They provide a wide variety of 
situations to try out newly learned knowledge and skills. 
"Transfer exercise must be followed by a review exercise 
because it is necessary to know what information or skills the 
students have retained before you can judge the transferability 
of the various concepts, skills etc. 

Analysis of documents, pupil's products, compositions, 
content, classroom questions etc. are all of great diagnostic 
value for improving instructional and learning strategies. This 
technique provides information about learning outcomes during 
the learning process in respect of mainly cognitive and psycho- 
motor skills. It is quite objective in nature but not stable over 
time. It is inexpensive but preparation time is long, though 


crucial. 


6. Relevance to Outcomes of Learning 


To have a comparative view, we can summarise the data about 
all the four teachings in the tabulated form as under: 


EET MENU NEN nd 


"Surunsuoo2 
"sjips 9Aniugoo 


-oumr 's1olro pue $019 oAnoofqo 
01 1991915 'oAnoofqo 15827 pur oAtsuodxo 1124 pu? лојошоцолва “008 8592014 SisA[eu v "p 
*sjuourog pnt 
perna 3nq our чопеледола Ў әлцоәГапѕ 'sopninie *suondoo1od 
'Хцї8иәт 'ошп 1040 219915 ION oAIsuadxo 5597 18100 *suoindoo1od-J[2s *suoiuld o Asinbuy 'g 
E i "иоцәюкләуш [21205 ‘ПОА 
E "oouopiAo г[дл1ә$до уэгир јеогдА ү, *souro»jno 19jOLWOYIASL 
5 Зитшпиоэ-әш! L SopiAO0Iq "0013211315 24) JO "зошоојупо 941129] V *oouvuroj1od јо 
E juotussasse ur AjlAmoofqng 1sour ur o[qesp] 'oArsuodxoug sjonpoJqd *oouvurioJ1od jo 5522014 UOnvAIO0SQO ‘С 
2 our иип lod "91500 0] "во риту <әшо2}по 
>. 
3 o[qe]reAe попешлојиг әлорү 10јошоцоќѕа “вәшозупо шивә 
& 9Arsuodxo JSON 9[q?i[o1 pu? oAn29fqo ISON ‘AMUZOJ чиошолог ов s,juopnis винзој, I 
E = 
S uone &илпәодО giqissod $әдиәрглә Jo S2dÁ ү, onbiuuoo.L 
S 19: PIAS J ТЕЧ 
"m 
S 
с 
= E 
© 
~ 


Handbook of Pupil Evaluation 


206 


7) 751521 
7912 59209125 poo 
ш 920A-EAIA L -u212jo1 
EI -чоцэшо сє 
Buryew 42245 '9 Ssa on 
"81522 Аи ел 
19° Suruoist] `$ 51501. 
51591 puuna ‘ә 
uoisuoqo1d E 
-iuoo Supeoy + uuo] сч 
73057313. "IMPAS `6 Ec uon h 51501 
Sunumpur, "p шд аео 
вәшир п сэ тергән `8 тб EAM. 
s[ppour 'а хица тр | мү кыш A 190 dup A c E 
uonisodiuo?) t punodqcq ошон ‘2 “219 әйоэѕәјәј, 'g 51521 apn ERE de E m 
sionpoud ,spdnq '9 хиса adoasuag ү ssa) Map v T ишп үш) 71 зәцәвә ү. 
IVO pui ДД], әң шер pruosiag '9 245 PINEN '9 ii | 
зопбиузо) aanaalorg '5 Алошо! 05 `$ 1әзәшошәцу, '5 | 
Apmis asep + MAINU р оЧозво о ^F завио 50 unum 
uoIssnosip әлізѕіш2әа "€ avos Sunmy "€ оЧозволоци "€ ^ 
pumipomos 'c eumüuonsonD "c asus] c A-UON PA 
ASSAY I Pa 1 p40221 рлорәэџу '1 ^ 
Ашу --------- Ambup -—---- корей === -=== -2 
» а A 
501801205 “ ѕәпЫицэо, sa1- 1 
M t sg e nut sonbiuyooy Sunso 
sodas sso18014 ‘T po — ——À À—à 
Spar p20291 олпејпшто "1 uonesuvurums 304 ч 


T 
1 0829110.) virq 104 


532IA30 омшуптула 
ачахамму 3 


IESU 
-ouoAU! 
Ayyeuosiog 7L 
souotuoAut 
15010]u '9 
"sopuas 
эртшү `$ 


E 
opnindy "р 

81521 
ours € 


onsousrig 'z 
їчәшәләцү | 


T 


posipivpueis 


i i 


CHAPTER УШ 


GENESIS ОЕ А QUESTION 


1. Introduction 


А question is a stimulus which evokes the intended response. 
According to the New Webster Dictionary (1), it is an inquiry, 
an investigation or an interrogative sentence soliciting an answer 
from an individual with some particular purpose. When applied 
to testing it may be defined as the smallest unit of a test. This 
Word is sometimes used interchangeably with an item. Each 
statement of a true/false test or each blank in a completion test 
represents à question or an item. An exercise may consist of 
one or more questions. A question may be viewed as a single 
isolated question as well as a constituent unit of an exercise or 
a question paper. In cither case the impact of a question would 
differ as also its characteristics. Thus the connotation of a 
question varies not only with reference to the context in which 
a particular question is used but also with reference to its 
existence as an independent unit or a part of a test. It is there- 


fore, desired that as a result of going through this chapter the 
teacher will be able to : 


(a) understand the con 

(b) identify the bases 
categories, 

(c) relate various mental processes 
paradigm of a questions, 

(d) explain the genesis of a question, 


cept of a question, 
of classifying questions into different 


involved in developing а 


208 Handbook of Pupil Evaluation 


(e) describe various characteristics of a good question, and 
(f) apply relevant criteria to judge the quality of a question. 


2. All Questions are Not Questions 


The word question when used in the general sense does not in 
fact relate to an act of questioning. It may take different forms 
other than questioning. Let us observe the following examples: 


(a) List three characteristics of living organisms. 

(b) State three laws of motion. 

(c) The capital of Mauritius is...... 

(d) Justify the introduction of open book examinations at the 
Higher Secondary stage in your state. 

(е) The Cabinet Minister for Human Resources Development is 
(A) V.P. Singh 
(B) Narasimha Rao 
(С) К.С: Pant 
(D) Krishna Sahi 

(f) Why do you put a lot of salt to preserve pickles? 


АП these *questions' in the real sense of the word are not 
questions. The first one requires enumerating; the second one 
requires only a statement; the third one demands completion of 
à sentence, the fourth concerns eliciting only positive arguments 

and the fifth one requires selection of the correct answer, It is 
only the last one which indeed appears to be a question. Thus 
if different types of so-called questions are categorised, we 
would find that all of them do not involve query and the process 
of questioning. However, the word ‘question’ in educational par- 
lance has come to stay with the connotation of an item or a 
constituent unit of a test or an exercise demanding the exami- 
nees to do this and to do that and is not limited to an inquiry 


or questioning as in the case of the last question. 


3. The Basic Source 


The basic source of every question is the curriculum content or 
the syllabus prescribed. This does not, however, mean that one 
Саппо тате a question independent of a syllabus. What I 
intend to convey is that every question has its root in some 


= 


Genesis of a Question 209 
subject-content. Therefore, when one thinks of a question in 
terms of its purpose one has to think whether that question is 
to be used for an instructional purpose or for an evaluation 
purpose. Accordingly, the question can be development-orient- 
ed or judgement-oriented. If it is development-oriented, then it 
can be used as a teaching device and if it is Judgement-oriented 
it may be used asa testing device, However, the focus of both 


would be on measurement Vis-a-vis attainment of an instruc- 
tional objective, 


4. The Context 


question varies from context to context. 


hing, the Purpose of a question could be 
» diagnostic, review or improvement of the te. 


re inforcement, revision or 
ing. If the context js examination, 
ay be to judge the attainment of an 
achievers, to classify the students for 
or to judge curriculum effectiveness, 


5. Facets of a Question 


There are umpteen ways of categorising the questions, depend- 
ing upon the basis which one uses for classifying. ln the 
Context of testing, depending upon the purpos 
be -used for improvement of students’ learni 
for certification of their achieve: 
may use a question for gradin i cement 
of learning. As a device, а esti i 

practical. A question ma 
of testing, Observation or inquiry. As in the 
form of a question, statement or an activity. It may be of the 
Supply-tyre, Selection-type ог of the rearrangement-typs 
depending upon the format of response, It may be categorised 
f Tesponse depending 
upon the scope of expected Tesponses, [ts content of testing 
may be a fact, a concept or a Process, qt may be based on à 


an application objective, qy 


210 Handbook of Pupil Evaluation 


may represent molecular, cellular or ecological levels (in case 
of a bio-question). It may be set at an easy, average ora 
difficult level. 

Thus in a testing situation (context) а question can be set 
for grading (function) students to certify (purpose) their 
achievement with the help of written words (mode) using a 
diagram (medium) by means of the supply-type (format) 
demanding restricted response (scope) of short answer- 
(form) type based on a concept (content element) to judge 
students' ability to interpret (objective) and can be attempted by 
50) to 60% of students (average difficulty). Most of the hidden 
facets of a question as indicated in the above statement can be 


Facets of a Question 


==) ки тоз с] 


Context }—> Instruction Testing Textbook exercise 
Fonction [> Diagnosis |, grading Reinforcement 
Purpose a Improvement Certification Selection 
Mode L On Written Performance. 
Medium L—3» Word Diagram Activity 
= 1 
Format — -— Selection type Supply type Rearrangement type 
i ' 
К 
$соре Extended Restricted Fixed response 
response response 


T 1 
Fact Concept Process 


| 
SS санг 
Form |—» Long answer Short answer Objective type 
-—> 
> 
=? 


Content 
element 
| | ma" 
Objective Knowledge Understanding Application 
r 1 ==) 
Difficulty Easy Average Difficult 
level 


Example : What does this graph 
indicate regarding 
growth of yeast cells 


No. of yeast celis 


= 4 6 & 10,12 1461009 
Time in tours 


"Genesis of a Question 211 
depicted in the form of a scheme given on the previous page. 
6. The Paradigm of a Question 


Whatever may be the question it requires a lot of insight for 
its development. What are the underlying assumptions while 
framing a question? What sort of answers do we except in 
terms of responses to such a question? What situation is to be 
provided to elicit those responses? How can the gap between 
the intended and the actual responses be minimised. These are 
some of the questions which the item writer must ask himself 
before starting to write an item. It is seldom that the item 
writers remain cognizant of all such mental operations which 
need to be integrated into a desired configuration to provide 
the appropriate stimulus to eveke the intended responses. It is 
possible only when the examinee uses the same mental process 
as intended in the question (process response) and the response 
(product response of the examinee) is correct. 

The question is one in which the discrepancy between the 
process response and the product response is minimum, This is 
possible only when one develops a new item in accordance with 
a paradigm that depicts relationship between the underlying 
assumption, objective, content, situation, wording and nature 
ofresponses. The paradigm given in a diagram appearing on 
next page, may perhaps provide the needed genesis for writing 
a question. 


7. Genesis of a Question 


From the paradigm given below it is not difficult to infer that 
the structure of a question is closely related to its functionality. 
What usually happens is that the phenotype (outward appear- 
ance) of a question does not always reflect its genotype (genetic 
material or constitutional make-up). 

Thus a question has two faces; one which is manifested in 
the form of observed format i.e. wording of stem and alter- 
natives, say in a multiple choice question (phenotype). The 
other face is represented by its built-in characteristics (geno- 
type) of the question forming its substantive structure. This 
structure comprises a set of characteristics which are regarded 


212 Handbook of Рири Evaluation 


223, Time «—Àb0!U.. Assumptions > Аюш. _ у Matks 


у 
Testing 


Му a А i EG гтэ Single 
t : aue ан 

mn '| = ] Composite, 

м. 


Sp. Ability” 


XT Dif. Level 


„Ж. 
Situation ^ ''* ДАО 


gah ‚ай... aie! 
| Lon Vo i beh, 


Form'of Question. 


Teaching 


“ 


d 


Wording 


Substance 


| 


Intended Response 


› 


~ 
Scope DirectionFormat€——4-——» Mechanics 8 
Model Answer m 


Editing 


„Чч 20) ош 


лол 


PARADIGM OF A QUESTION 


as symptomatic of its potentiality to evoke the desired résponses 
in a testing situation. When the phenotype (manife&téd' respon- 
ses) is the same as embedded in the genotype of the Quéstion it 
is a case of complete congruence between the intended and 
observed responses. However, phenotypic (observéd) 'Tésponses- 
of a quéstion may be different inspite of the same genotype. 
This isa case of discrepancy between the intended: and the 
observed responses. Likewise the genotype of two questions 
may be different but the phenotypes (observed responses) may 


DRG. ә 


Genesis of.a Question х «i 213. 


be the same. , This incogruency, may be due to poor. wording of 
a question or misconception of the, objective being tested. The 
best question; is- опе. whose phenotype. is congruent: with its 
genotypes, The following table would indicate the various 
genotypes. of а. typical) question reflecting three of its major 
dimensions; namely, the objective, content element and form of 


question. 


K=Knowledge БА Б САВ РАЋ 
U=Understanding F A.S QC А Бир АНУ 
A — Application БЕЧА О G А ОЛ нр А 
Б= Расі (ава Ва ^C U Be Pte 
C=Concept Riss 61705 P.U. S 
Р =Ргіпсіріео ЕТО с Ue PUNO 
O=Objective type Е.К E C X E P ки 
S=Short answer к коб © Урука 
E=Essay Eg m E KOC K 0»? uP EKO 


Genot ) pic table of. a question (cognitive domain) 


The йаа basic characteristics which forms the genotype: of 
a question! ina cognitive dómain are the ‘content’, ‘objective’, 
and the! ‘form’ of question." The content may be a fact (F), a 
concept (C) or a principlé\(P).'The objective may be knowledge, 
understanding or application. ‘The form of a question тау be 
the objective type (О), е ‘shorter answer type (S) or the essay 
type (E). Thus we' have (3х3 х3) genotypes of a’ typical 
question" leading to 27 types of questions which сап Бе framed. 
This inclüdes 9 varieties’ of knowledge questions, 9 of under- 
standing = 9 of application as shown in the above table. 


8. Characteristics of a Good Question 


relevant cont 
however, ensure ‘that’ cognizance of these three would serve the 
purpose for which, it is developed. It has to be couched i in such 
a language that, it provides t the needed, direction to the students 


214 Handbook of Pupil Evaluation 


to evoke the intended responses. The first criterion of a good 
question therefore, is its relevance to the intended genotype. 
Secondly, it must evoke intended responses in terms of its- 
genotype. Thirdly, it must provide responses which are accept- 
able at a particular level in terms of its criterion. Fourthly, it 
must ensure the maximum congruence between the process 
responses (intended) and the product response (observed). 
Fifthly, in terms of its efficiency, it should yield appropriate 
facility and discrimination indexes. Lastly, it must be econo- 
mical in terms of time requirement for maximum coverage. 
Whatever may be the criteria of a good question, the best 
question is one which evokes those very mental processes to 
attempt is correctly, which are intended by the authors of the 
question. 


9. Dynamics of a Question 


A question plays a significant role in the process of teaching 
and learning by providing the basis for interaction among its 
users. It can engender interaction among students, teachers, 
peers, parents, paper-setters, examiners, examinees and adminis- 
trators. A question can be used by the students for self-learning 
revision of a lesson or for self-evaluation. A teacher can use it 
for development of a lesson, for reviewing alesson and for 
diagnosing inadequacies in learning. A paper-setter can use a 
question for setting a standard, classifying pupils and certifying 
pupils’ achievement. An examiner may look to acceptable 
Tesponses, uniform scoring and objectivity in marking. An 
examinee tries to get insight into the intended response, under- 
stand the intent of the author and guess the nature of the in- 
tended response. The peers try to discuss the expected answers, 
compare their responses with the intended ones, and use it for 
self-evaluation. Parents judge their wards’ performance to 
identify gaps in their’ learning and coach their wards in terms 
of expected questions. Administrators judge performance 
standards of pupils, quality of examination and effectiveness of 
the instructional process. (2) 
Thus the interaction indicated above highlights the import- 
ance of a question which forms the basis of interaction among 


Genesis of a Question 215 


its clientele. Therefore, when a question is used in different 
situations it provides interaction of various types. This is how- 
ever, possible only when its user employs it at the proper time 
in an appropriate manner for a predetermined purpose in the 


given context. 
10. Judging a Question 


АП methods of judging a question are subjective. Moreover a 
question when considered as an independent entity and when 
it is a constituent.unit of a test, manifests its behaviour diffe-. 
rently. A question may not turn out to be a good question 
inspite of the best reflective. judgement made by experienced 
teachers and subject specialists. This is because every individual 
has a different background and technical know-how ofthe 
content element and the objective being tested. The same ques- 
tion is characterised as knowledge based by some and as 
application based by others. This again depends on the assump- 
tions made by the critic about the quality of instruction provid- 
ed to the students. When a question is tried out to get the 
empirical validity, the same problem arises because it gives 
different results with different groups of students taught by 
different teachers. A question in terms of facility index and 
discrimination index may be very good as an individual ques- 
tion used with a particular class. But the same question when 
judged in terms of a different class, may lose its potential and 
rated as more difficult or more easy, having lesser or more 
discrimination index, the question remaining the same. 

In a norm-referenced test, a question with a low discrimina- 
tion index of, say, 0 1 may be considered unsuitable because it 
gives very little variance between the low and the high 
achievers. But when the same question providing the same 

' index of discrimination is visualised as a part of a criterion- 
referenced test, it may be considered suitable because it shows 
low variance, which indeed is the focus in such tests for iden- 
tifying the masters and non-masters. Similarly, all good ques- 
tions when put together in a question paper do not necessarily 
make it a good one. In fact, it may result in a bad paper: for 
example, when all good application questions are included in a 


216 Handbook of Pupil Evaluation 


question paper meant for a heterogeneous group. Ап easy ques- 
tion paper may lower its validity for one group of students but 
may prove difficult for a different group and vice-versa. An 
application question proves to be a simple knowledge question 
for students taught with emphasis on higher level objectives. 
Thus a good or a bad question isgood or bad in its own 
niche or context. The quality of a question, therefore, can be 
~ judged only in terms of a particular group of students and the 
* quality of instruction. they recéived. One cannot judge the 
quality of a question only through test results. The best way, 
“therefore, is that the classroom teachers should construct quality 
items, with the help of fellow teachers teaching the same subject 
in the same institution, followed by the use of those questions 
in classroom examinations and improving them further through 
analysis of responses of examinees. Once the teachers are well 
trained in the technique of item-writing, long-term and short- 
term targets can be fixed for development of item pools which 
can be used later for teaching and testing. 


11. To Sum Up 


A question is a useful but dangerous tool in the hands of 
incompetent teachers and evaluators. For its effective use in 
teaching, learning and evaluation, one has to understand fully 
its dynamics. A bad question is a reflection on the author's 
limitation of technical competence and mastery of the structure 
of the discipline. Unless’ the item writer is able to integrate 
content elements and specific abilities properly, it will not be 
possible for him to produce a good question. Using a badly 
framed question is professionally unethical as it creates mis- 
trust and Wrong conclusions about instructional elliciency and 
the pupils’ learning. Framing a good question is not every- 
body’s cup of tea and passing judgement on the quality of a 
question without knowing the context, focus and purpose is 
“unethical. 1 have tried to raise a number of questions about а 
question in this paper. : 


tof 


\ 
| 
| 


~ 


' 


Т CHAPTER IX 
! s 4 IMA 
CONSTRUCTED QUESTIONS (SUPPLY. TYPE), 


i TA 
bob i ] 1 ју ући 
TEA pna |o rayo 


pa, u А LONG-ANSWER QUESTIONS, 1:1; 13 ty 930 
Al cmn. pis Ж Р пере Да es 

After going through this chapter it is expected that the reader. 

will be able, to ' Ph Я 


(a) know the difference between objective based, and objective 
type questions, , А ти 7s DER 
(b) classify the given questions, into, Long answer, Short answer , 
and objectiye type using the prescribed criteria. (S NUS 
(c) give examples of extended response and restricted response 
type of questions testing different abilities. | 1 
(d) improve the defective quéstions of each type by using the 
principles of framing questions, qe 


(e) analyse the factors affecting subjectivity in, making long 


answer questions and the associated steps for improving 
their scoring objectivity. ' Р 
(f) construct. short answer cand „усту short-answer questions 
using relevant criteria to test different abilities. — . ; 
(g) detect errors’ in, the given questions of short answer and 
very short-answer questions and improve them using the 
relevant principles of construction... 


1. Objective-based Versus Objective Type Questions 


In the previous pages we have described the various tools and 
techniques that can be used to obtain evidences about pupils? 


218 Handbook of Pupil Evaluation 


growth in terms of expected outcomes of learning. By and 
large teachers are more concerned with the collection of 
evidences in terms of instructional objectives towards which 
all-out efforts are made in the classroom using varied types of 
teaching-learning strategies. How many teachers take 
cognizance of the objectives or the intended learning outcomes 
while writing a question, is anybody's guess. It is, therefore, 
essential that before framing a question, the need for identifying 
the intended objectives cannot be over-emphasised. Indeed, 
every question is based on one ог the other objective but the 
fact remains that the item writer is seldom conscious of the 
objective he is going to test for. A question may require the 
students to write a long or short answer or just select an 
answer out of the 4 or 5 designated Tesponses, as in the case of 
an objective type question. An objective type question refers 
to the form of the question whereas an objective-based question 
refers to the base of the item i.e. the instructional objective on 
Which it is based. Thus an objective-based question may be 
of the long answer, short answer orthe objective type variety. 
Every question of the objective type is, therefore, objective- 
based but every objective-based question need not be of only 
the objective type. 


1.1. Classification of Question 


Depending upon the nature of expected responses and mode of 
attempting a question, we can classify all questions into two 
Categories viz, supply type and selection type, the former 
requiring students to supply the answers while the latter demand- 
ing only the selection. of correct responses out of the given 
Ones, Since the nature of responses in the supply-type questions 
Would vary, ranging from the very extended response variety to 
that of the very restricted form Tequiring students to supply 
just one word. We may classify them on a continuum of 
response length. The following list of questions would further 
illustrate this viewpoint. 


Constructed Questions—(Supply Type) 219 


Questions: 
1. Write an essay on the uses of plants. 
2. Write an essay on the uses of medicinal plants. 
3. Describe four uses of medicinal plants. 
4. Describe two uses of medicinal plants with examples. 


5. List four main uses of medicinal plants. 
6. List four plants of medicinal use. 
7. Name any two medicinal plants. 
$. Name one medicinal plant. 
9. Which medicinal plant gives quinine? 
10. Which of the two plants, Cinchona 
or Citrus gives quinine? 
11. Does the Cinchona plant give 
quinine? 
12. Which of the following 
plants yields quinine? 
A. Aspergilus 
B. Cinchona 
C. Neem 
D. Citrus 


From the above examples, it is clear that on the basis of 
restriction imposed on the degree of freedom of response, one 
can classify various types of questions as depicted below: 


Teacher made tests 


Free response type Fixed response type + 
Extended — Restriced Supply Selection 
response response type type 


varicty variety | 


Longanswer Shortanswer Very short Objective 
type type answertype type 


Objective Based Questions 


2208. Handbook of Pupil Evaluation 


All these forms of questions are based on the nature of 
Tesponses and the mode of answering the questions. Each of 
these types of questions is discussed hereinafter. 


2. Essay-type Questions 

The most commonly used tools of evaluation are the essay-type 
questions, although the other two types viz., short answer and 
objective type are gaining prominence day by day due to their 
inherent advantage of being scored ‘more objectively and 
interpreted more easily. Despite the wider applicability of 
the short answer and objective type questions, there are certain 
outcomes of ‘learning e.g. to organise, summarise, integrate ideas 
and express in one’s own way which cannot be satisfactorily 
measured through ‘objective type tests. The importance of 
essay tests lies in the! measurement of such instructional. 
"outcomes. e 

Essay tests have Ъёсоше the focus of criticism for those who 
are interested in the science of measurement as in achievement. 
Due to this, essay tests have been left undeveloped inspite of 
the fact that they are still the most widely used form of test by 
teachers: They have some unique, potential educative values. 
Therefore, the need for developing these tests and undertaking 
research in this field cannot be underrated, 


2.1. Two Varieties of Essay-type Questions 


Essay tests may be of-the-extended response type or the 
Testricted response type’ depending upon the limits of the 
€Xpected response. An essay question, unlike an objective 
question, is not an all-or-none affair but rather a matter of 
degree. On the one extreme, an essay question may give full 
freedom to the students to write any number of pages. On the 
other hand, a restriction may be-imposed on the length of the 
response by limiting the scope of the "expected answer. This 
restriction may be imposed їо-зиеК an extent that the required 
Tesponse may vary from One word;a phrase or at the most a 
Sentence, a few lines or even а few paras. This limit may be 
imposed by restricting the content and the length of students” 


Constructed Questions—(Supply Type) | 221 


response іп the statement ‘of a' question: Restricted response 
type items are quite useful for learning outcomes which require 
interpretation, application of data or outcomes which are 
specific and clearly defined in nature. Such type of questions 
help to reduce subjectivity in marking, for which essay. tests 
are infamous. 

In extended response type questions, full freedom is given 
to the student to exercise his competence and demonstrate the 
best he possesses, of course pertaining to the area of the 
subject. He is free to select, organise, integrate, evaluate and 
express in any way he likes or deems appropriate. Such 
questions although useful for measuring global type of abilities, 
are not suitable for measuring specific learning outcomes, 
besides being difficult to grade. 


2.2. Classification of Essay Questions 


Monroe and Carter (1923) divided essay questions into 21 
types (3): They, include questions on (1) Selective recall, (2) 
Evaluative recall, (3) Comparison of two things— single basis, (4) 
Comparison of two things, general, (5) Decision—for or against, 
(6) Cause or effect, (7) Explanation of the use or the exact 
meaning of the same word, phrase or statement in a passage, 
(8) Summary of some unit of the text or of some article read, 
(9) Analysis, (10) Statement of relationship, (11) Illustration or 
examples (the pupils") of principles in science, construction in 
language etc., (12) Classification, (13) Application of rules, laws 
or principles to new situations, (14) Discussion, (15) Statement of 
an author's purpose in his selection or organisation of material, 
(16) Criticism, (17) Outline, (18) Reorganisation of facts, (19) 
Formulation of new questions, problem and question raised, 
(20) New procedures, (21) Interartial thinking. 

Weidman (1922, 1941) distinguished 11 definable types of 
tests arranged in series from the simple to the complex with 
directional words as under (4): 


(1) what, when, who, which, where, (2) list, (3) outline, (4) 
describe, (5) contrast, (6) compare, (7) explain, (8) discuss, 
(9) develop, (10) summarise, ard (11) evaluate. 


222 Handbook of Pupil Evaluation 
2.3. Requisites of Essay-type Questions 


(a) Essay questions should be set to test only those instruc- 
tional objectives which are not amenable to testing by other 
forms. 

(b) Each question should be based to test specific mental 
processes, learning outcomes implied by the objective in 
view. 


"(е) Phrase questions in such a way that their meaning and 
intent are clear to the examinees. 


(d) Structure questions in such a way that the scope of the 
expected answer is clear. 

(e) Directional words like, ‘what do you know of', 'give an 
account of’, ‘write short notes on’, may be avoided to avoid 
vagueness of answers and consequent subjectivity of 
Scoring. 

(f) Maturation level of the examinees must be taken into 
consideration while constructing an essay question. The 
length and nature of answer will differ from class to class. 
For example, questions requiring discussion, interpretation, 
summarisation and evaluation may be asked in higher 
classes whereas questions like listing, describing, selecting 
etc. may be considered for lower classes. 


(g) Marks should be clearly allocated part-wise, whenever 
there is more than one part in the same essay-type question. 


The realibility of essay examinations can further be 
improved if the students trained properly through the use of 
‘the important variety of such questions in day-to-day testing 
Programmes in home examinations. They need to be 
familiarised with the method of attempting such questions in 
‘accordance with the connotation of the various words, especially 
‘the directional words used to circumscribe the nature and 
scope of the answer expected. This will ensure, to a great 
extent, the consistency in each student's understanding of what 
he is required to write in response to a particular question and 
the way the teacher is going to grade it. 


"Constructed Questions—(Supply Type) 223 


2.4. Improving Essay Questions 


Improvement of essay questions may be made keeping in view 
the factors that affect the measurement value of this instrument 
both at the level of framing the questions and at the level of 
scoring the scripts. This can be taken care of by taking some 
precautions of a general nature regarding planning of questions 

and their grouping in the form of a test and certain specific 
criteria for structuring the questions for ensuring better 
reliability in marketing. (4) 


2.4.1. General Hints 

(a) Give as much time to writing a good essay question as it 
would require to score it. This would help to remove many of 
the angularities of questions that create trouble in marking. 

(b) Avoid choice of questions to discourage selective study, 
ensuring better content coverage and making comparison of 
students’ performance easier. 

(c) Structure each question keeping in view its time require- 
ment. 

(4) Keeping in view the range of difficulty, value as wide as 
possible to cater to the students of different abilities. 

(e) Eliminate as far as possible all irrelevant factors like 
spellings, handwriting etc., that affect scoring. Make 
specific provision for these if they are to be specially 
emphasised or considered at all. 

(f) Write explicitly the general directions for questions to 
provide a common base for all pupils to attempt questions 
on similar lines towards the same goals. 

(g) Try to increase the number of questions by including more 
questions of the restricted response type for securing more 
content coverage. 

(h) Use essay questions where you must and avoid where you 
can. They may be utilised only for those objectives and 


content areas which are amenable to testing through such 
questions. 


224 Handbook of Pupil Evaluation 


2.4.2. Specific Hints 

(a) Base your questions on the selected instructional objectives. 

(b) Make use.of the specific content element appropriate to the 

. purpose-of the question. 

(с) Structure question so as to delimit the scope of the expected 

' answer to pinpoint the area of response. 

(d) Use simple, precise and unambiguous language to avoid 

^^ “semantic difficulties. 

(c) Indicate marks clearly, part-wise, if the question consists of 
more than one part. Moreover, these marks should be in 
whole numbers and divisible according to the number of 
credit points. This ensures more objectivity in Scoring. 

(f) Set tasks in the question which require the students to 
demonstrate their command of essential knowledge. 

(g) Use familiar directional words and those which elicit the 
required responses e.g. compare, give reasons, etc. Direc- 
tional words like ‘Write short notes on’, ‘What do you 
know of’, ‘Give an account of? etc., should be avoided as 
they create difficulty in understanding the nature and scope 
of the expected answer. Directional words like ‘discuss’, 


‘comment’, ‘justify’ should be used less frequently and 
only after having trained the students in the use of such 
words in relation to the type of responses expected in such 
cases. à 

(h) Test your question by writing a model answer. Writing the 
expected answers helps to improve the question by 
Serving as a check on the language used, scope of the 
question, allotment of marks and the objective tested. This 


leads to the remodelling of the question and/or the 
answer to fit each other. 


Example 

Let us take an example to indicate the various steps involved 
in writing a good essay question. Before testing students on 
ihe unit *Organic Evolution’, Miss Biology gave some thought 
to the type of essay type question she could frame to test 


Student’s achievement on the unit. She framed the following 
question: 


Constructed Questions—(Supply Type) 225 


(i) 


(ii) 


(iii 


(iv) 


(v) 


(vi) 


Write an essay on organic evolution. 

It looked quite acceptable at first but at a second look she 
felt that it would be too lengthy a question where students 
can write anything about the concept, evidences, theories, 
mechanism of inheritance etc. What she wanted was to 
test students’ ability to analyse various theories of organic 
evolution. So she reworded the question as under: 

“Write an account of various theories of organic 
evolution." 

On further reflection, she realised that in this revised 
question there is over-emphasis on the recall of factual 
information whereas in the blueprint of the unit test 
she wanted a question involving analysis and judge- 
ment-making abilities. She tried to improve it further and 
reformulated the question as under: 

“Describe the various principles that underlay various 
theories of organic evolution." 

This question was also rejected by Miss Biology as she 
felt that it did not really invoke s:udents' analytical skills. 
Therefore, the question was reworked as follows: 

“Which theory of organic evolution appeals to you the 
most and why? 

This question was considered nearer to the intended 
objective as viewed by the framer. However, on close 
scrutiny it was thought that what Miss Biology wanted 
was to judge students’ ability to argue for and against 
various theories with regard to the adequacies or inade- 
quacies in explaining the mechanism of organic evolution. 
She therefore, reframed the question 2s follows: 

“Which theory in your opinion explains better the mecha- 
nism of organic evolution and why?" 

Even this question did not satisfy her completely when 
viewed in the light of its scope, which remained limited to 
testing of oaly one theory rather than providing for 
evaluation of each theory to pass a judgement. Thus 
the question was reworked and reframed as under: 
*Discussthe various theories of organic evolution. from 
the point of view of the mechanism of inheritence.” 


226 Handbook of Pupil Evaluation 


Miss Biology reached the point where she was happy that 
her question now satisfied the major criteria of a good 
essay question. Nevertheless, what still remained is the 
working out of the ouiline answer and value points which 
would form the basis for the marking scheme. Moreover, 
She had also to inform students whether, handwriting, 
spelling and grammatical mistakes would be considered 
while grading the answers. 


All the above mentioned steps involved, relate to the 
cognizance of instructional objectives, the Scope of expected 
response, the use of the appropriate directional word, precise 
language, intended content-coverage and ensuring maximum 
possibility of scoring objectivity. 


2.5. Grading Essay Questions 


The effectiveness of essay questions depends largely on how vell 
they can be graded. An ill-conceived and poorly-developed 
essay question cannot be salvaged by even the most carefully 
devised method of grading. Improper grading however can 
spoil the well-constructed essay question. Nevertheless, 
unreliable grading is one of the most valid criticisms levelled 
against essay questions and, therefore, we need not 
over-emphasise the scoring objectivity of the essay tests. To 
grade responses on essay questions we can try to: 


(a) identify appropriate methods to minimise biases, 

(b) attend to significant and relevant aspects of the answer, 
(с) apply uniform standards of scoring to all scripts, 

(d) prevent personal idiosyncrasies that affect grading. 


We should aim at optimum reliability only and try to reduce 
intra-examiner and inter-examiner variance. 
and scoring objectivity should be ensured 
by taking necessary steps at the time of $ 
and while developing the ma 
grading can be used (5): 5 


Scoring unformity 
as far as possible 
etting the question 
Tking scheme. Two methods of 


UE 


Constructed Questions—(Supply Type) 227 


2.5.1. Grading by analytical method 
This method is also called the point score method in which a 


model answer is broken into specific value points each of 
which is given proportional weightage in allotment of marks. 


2.5.2. Global scoring method 

Itissometimes also called the rating or holistic method. In 
this method the ideal answer is not split into component parts 
orspecific points, but it simply serves as a standard. The 
responses of students are graded on the quality continuum 
which has a number of standards or anchor points. The rater 
reads the response, forms а general impression and assigns a 
rating, using some standard on the quality continuum. А 
response can be graded as good, average or poor. To be more 
discriminating, it may be graded on a 5-point scale like excellent, 
good, average, satisfactory and poor. Such scale values can 
be established by preparing a variety of answers corresponding 
to the various scale points. Alternatively, the scorer can 
select papers from the already written actual responses to 
establish the various anchor points. In a five-point scale, the 
responses can be categorised in terms of: 


superior quality, 

above average quality, 
average quality, 

below average quality, and 
inferior quality. 


POSU S 


The reader after a rapid reading, can assign scripts into 5 
piles. To ensure more objectivity each response should be 
read and classified twice, preferrably by a different rater who 
assigns the grade independently When a large number of 
essays are to be graded, the global method is more useful 
especially for the teachers who have only 30-40 scripts for 
grading. However, if the number is extremely small, it will be 
difficult to establish anchor points. In case of a homogeneous 
group of students, we ‘can make preliminary reading and 
assign the pap:r into one of the piles, Each question cin then 
be re-read one or two times to reclassify the scripts Which are 


228 x Handbook of Pupil Evaluation: 


misclassified. 


A student’s score depends on the number of value points 
contained in his answer. Where expression is involved, related 
value points like those of logical organisation, effectiveness of 
expression, arguments to support or reject an issue etc. are 
Specified and assigned points of values. Thus а sort of listing 
of value points in the form of a checklist is done for the use of 
examiners. Depending upon the complexity, significance and 
length of the expected responses, different values for different 
points can be given eg. a credit point may carry one mark 
while another can carry two or three marks. As the scorer 
reads the responses, he goes on giving points to those credit 
points which are contained in the answer. 

This method minimises the influence of extraneous factors. 
Despite a detailed Scoring guide, it is not possible to control 
human variability because of the fallibility of the human teacher 
ie. the scorer. However, this method yields very reliable 
Scores when used by a conscientious scorer. The outline 
answer and detailed marking scheme, when prepared alongwith 
the framing of questions, helps to remove a number of 
incongruencies such as the difficulty level, unrealistic time, 
wording etc. The five sub-divisions of a model answer make 
it easier to assign grades to students which are considered more 
trustworthy. It is of course a laborious, time-consuming 
method and may lead to identification of aspects or elements 
Which may be superficial. This method is more suitable to 
apply in case of the restricted response variety of essay 
questions which demand more specific responses. The extended 
Tesponse variety of essay questions can be scored by this method 
more reliably for the same reason. 


3. Improving Reliability of Scoring 


The following precautions may be taken to control the reli- 
ability: 


(а) Use precise, unambiguous and understandable language to 
enable the examinees to make the same meaning out of the: 
Question so as to focus their attempt at the same time. 


Constructed Questions—(Supply Type) 229 


(b) Delimit the scope of expected response by structuring the 
question in such a way that it pinpoints the area of 
response desired. ; 

(с) Use appropriate directional words to avoid semantic diffi- 
culties. A particular. directional word should be carefully 
selected in conformity with the abilities to be tested. This 
helps to improve validity by directing the purpose of the 
question, thereby ensuring more reliability. 

(d) Allocate marks part-wise where the question reflects diffe- 
rent aspects to be credited separately. It helps the exami- 
nees to lay proportional emphasis on each part and 
facilities scoring on the part of the examiner. 

(е) Make use of proper directions in the question paper to 
enable the students to know whether organisation, hand- 
writing, brevity etc. in the answer would be given some 
credit or not. This will also reduce unreliability. 


4, Improving Scoring Objectivity 


Following suggestions may be considered useful to reduce 
subjectivity in scoring essay type questions (6): 


(a) Write an ideal answer to the question in advance. This 
should include major poiats to be evaluated. These are the 
value points expected from the students. In extended 
response items, the main points may be outlined. This 
provides a common base for grading the scripts, thereby 
reducing subjectivity in scoring. Model answers prepared 
for each question at the time of construction of items helps 
to phrase the question more clearly to convey the type of 
answer expected. ? 

(b) Evaluate answer scripts as anonymously as possible. The 
impression a student makes a boy causes persistence effect 
while marking the scripts. To avoid this type of bias either 
some fictitious roll number may be allotted or students 
may be asked to put their names at the end of the answer 
script so as to conceal their identity till the whole answer 
script is examined. 

(c) Use the analytico-synthetic method for grading. This in- 


230 j Handbook of Pupil Evaluation 


volves scoring by the point method or the rating method. 
In case of restricted response questions, the point method 
involving analysis of the various value points may be used 
by comparing the responses with the ideal answer prepared 
in advance. For global-marking, the rating method may 
be used by placing the answer scripts in different piles, say 
five, according to the quality of responses. This method 
suits evaluating questions of the extended response type 
where gross judgements relating to organisation of ideas, 
their relevance etc., are to be made. Whenever the latter 
method is used, it is profitable to rate each characteristic 
separately—organisation of ideas, relevance etc. In the 
common 100 point scale is too precise to be used and 
that is why it may be desirable to use a 5 point or 9 point 
scale. Moreover, instead of a point score we may use the 
Tange score or the band score based on the score and the 
standard error of measurements. For example for a score 
of 50, if the standard error of measurement is 2, and we 
take deviations upto three sigmas, we have a band score— 
44-56. 

(d) Use question-wise scoring rather than script-wise scoring. 

When different scripts with different quality levels of 
Tesponses are piled after going through them hurriedly, 
then scoring of each script may be undertaken. It is better 
to evaluate one question through all the scripts before 
Passing on to the next question. This helps in using the 
same basis for judgement for all questions through constant 
comparison of the same type of responses. This reduces 
the shifting of standards from one paper to another paper. 
When this method is used, there is less likelihood of the 
“persistence effect", which is always formed when we 
examine script by script. 

(е) Score the question for each а 
the part-wise allocation of m 
scoring objectivity. 

(F) Mark Some scripts for the second time for comparison with 
the original score to confirm reliability of marking. А few 
scripts in the beginning шау be left unmarked and a 
separate record of the score may be kept for this purpose. 


Spect separately according to 
arks. This will tend to enhance 


Constructed Questions—(Supply T. уре) 231 


(g) 


Reshuffling of answer scripts after marking each question 
helps to avoid assessment of the same scripts in the same 
order thereby reducing the carry-over effect. 


(h) Double grading may be used when the test is to be used 


(0) 


Q 


(k) 


о) 


(m) 


for selection or awarding scholarship etc. Otherwise, only 
a few scripts may be rated independently by two examiners 
for better comparison and to confirm reliability of scoring. 
Total the marks of the scripts at the end after marking all 
the scripts. This will avoid carry over of any impression 
that is formed about the quality of the script when marks 
are totalled. 

Avoid creeping in extraneous factors like spellings, 
handwriting, punctuation, sentence structure, ncatness 
etc., during marking of the scripts. Such outcomes if 
stressed for their own sake, should be evaluated separately 
and specific marks may be kept apart for independent 
assessment. This would tend to ovoid contamination of 
of test scores which indicate achievement of other specific 
outcomes of learning. 

Examiners may meet together after marking some scripts 
to discuss the marking scheme and comparing marking. 
This results in reducing inter-examiner variability. 
Although it may be a little costly to provide separate 
answer sheets for each question, there is no denying the 
fact that it can bring improvement in scoring objectivity 
by allocating scripts question-wise to the examiners. As 
each examiner would be marking only onc question scor- 
ing, objéctivity is sure to improve. 

Essay examinations inspite of some inherent weaknesses 
have come to stay and are the major tools of evaluation 
used in internal as well as external examinations. Their 
significance for measurement of specific learning outcomes, 
which are not measurable by other types of examinations 
cannot be overlooked. What is needed is further develop- 
ment and research in this field to improve reliability of 
responses and accuracy of scoring. 

Be consistent in grading the scripts. There are other 
problems that need to be researched upon. In fact problems 
relating to reliability of-essay questions are well known, 


232 Handbook of Pupil Evaluation 


Studies conducted by Harper in India are worth quoting 
especially the two famous ones, the *Ninety marking Теп” 
and “Four thousand re-examined”. (7) 

His findings аге eye-openers both to the examining 
agencies and the policy makers in the field of school 
education. Those who are interested in knowing details 
of these studies may refer to the volume on 'Researches 
on Examinations’ by А.Е. Harper published by NCERT 
(1975). 


Despite various limitations, the essay type questions have 
come to stay and their role in measuring certain outcomes of 
learning cannot be underestimated. What is needed js to 
improve the construction of these tests and scoring procedures 
with a detailed marking scheme. Well-designed essay questions 
with a well structured scoring procedure could enable the 
teachers to grade the students’ responses more objectively, 
besides improving their validity and reliability. 


В. SHORT-ANSWER QUESTIONS 


1. Concept 


Short-answer questions are open-ended questions with highly 
restricted responses. These are free-response type questions in 
which the response may vary from one word to a few lines 
depending upon the degree to which restriction is imposed on 
the expected answer. Such questions are ranked in-between the 
essay and the objective type question as regards the objectivity 
of scoring. But this does not mean that they are exactly midway 
between the two extremes. In fact, these questions can be 
placed on a continuum ranging from the highly objective scor- 
ability. to the highly subjective scoring of responses. Thus they 
merge with either type at the ends based on the limits imposed 
on the freedom of response. Such tests become practically 
fixed-response type like any other objective type items when the 
‘Specificity of response is restricted to one word. For example, 
when we put the question, "Who invented the telephone?’, it is 
almost an objective type question. On the other hand, a ques- 


EMT UN 


A 


— а 


Constructed Questions—(Supply Type) 233 


tion like, "Why do you feel more hot in summer on a day . 
when the relative humidity is comparatively high?" is supposed 
to be a question о! the short-answer variety but it is very close ` 
to an essay type question because its explanation may demand 
an answer whose limits may not be determined from the 
language of the question. 

Although, such a short-answer question variety as a separate 


„class or form of question is difficult to justify, yet for the con- 


venience of scoring facility, a better content coverage and other 
practical considerations, such questions can be categorised 
separately. Nevertheless we may fix some arbitrary limits of 
such questions for practical purposes. 


2. Scope 


In order that such questions of the short-answer variety may 
yield better reliability, the following limits may be imposed on 
the freedom of response with reference to the allocation. of 
marks and time requirements. (8) 


(i) Number of words 
Short-answer questions may be set to get responses which may 
vary from 30 to 50 words or so. 


(ii) Number of lines 
Although it is preferable to limit the response in terms of words 
rather than lines which may vary from individual to individual, 


yet we may restrict it from one to five lines presuming that 
each line contains 6 to 8 words. 


(iii) Number of credit points 
While framing short answer questions it is desirable that expect- 
ed response may be limited to two, three or four value points. 


, In case the examiner is interested in testing more credit Points 


on the same content he should either set 


an essay type question 
or frame two or mor 


€ short-answer Questions on that content. 


(iv) Allocation of marks а 
As the responses Sought are Testricted to а few Specific value 


234 Handbook of Pupil Evaluation 


points and the content area tested is also comparatively small, 
short-answer answers carry few marks than essay type questions. 
Usually one to three marks should suffice for a short-answer 
question. 


(v) Time requirement 

Depending upon the length of the answer and the credit points, 
the time required for each question may be estimated. 
Normally, a short-answer question should not require much 
time to answer. As the answers vary from one word to about 
fifty we may allot 1 to 5 minutes as the time limit for short- 
answer questions. All the limits imposed are arbitrary and 
based more on the practical experience rather than on any sound 
academic rationale. 


3. Two Varieties of Short-Answer Questions 


Sometimes short-answer questions are classified as selection 
type and supply type. The former type 15 named the objective 
type while the latter type may be further divided into two 
categories. Depending on the nature of expected response we 
may call it ‘Very short-answer questions’. But when the response 
extends to about 30 to 50 words or so involving more value 
points than one of these questions, it may simply be called a 
‘Short-answer question’. Questions provided on the next page 
would illustrate the difference in the variety of short-answer 
questions indicating the nature and scope of expected responses. 

The list of questions on the following page is not exhaustive 
but only illustrative to indicate the extent of the freedom of 
response allowed to the examinees. However; in all cases the 
restriction on length of response and specific credit points is. 
imposed. In the last three questions (6, 7, 8) the emphasis Is 
one value point. Question No. 5 could also be categorised as 
the very short-answer (VSA) type but because of two value 
points I have preferred to classify it as short-answer. For the 
sake of clarity and variety in construction, short-answer 


Constructed Questions—(Supply Туре) 235 


Questions 
1. Describe in about 50 words the functions of various floral, 
whorls in a typical flower. 
2. Mention the names of different floral whorls of a typical 
flower giving the functions of each. 
3. List one major function of each of the four floral whorls. 
4. Give the technical terms for each of the four whorls. 
of a typical flower. 
5. Write two technical terms representing the 
essential parts of a typical flower. 
6. The main functions of sepals 15 to......(fill in 
the blanks). 
7. Name the whorl representing the male 
part of a typical flower. 
8. Which of the two, calyx or corolla, is 
more helpful to a flower in pollination? 


questions can be further sub-classified as under: 


(a) Open ended— Restriction on response — Credit points not 
length but not defined stipulated (Q. 1, 2) 

(b) Open ended—Response length —more than one 
defined credit point stipula- 


ted (Q. 3, 4 & 5) 
(c) Open ended—Response in one word, —Single credit point 


one phrase or one (Q.6& 7) 
sentence 
(d) Open ended—Response choice is — One credit point 


provided in question stipulated (Q. 8) 


| Thus all these types of short-answer questions can be 
| arranged on a continuum of freedom of response which admit 
| Varying degrees of scoring objectivity. On the one extreme they 
| oi subjectivity both in content and organisation of responses 
s m duse it ensures 1007; objectivity in scoring 

iet od (5, 500 e correct response from the two designa- 
+ 9) Depending upon the nature and content of a 


unit and the extent of scoring objectivity desired, one can make 
a choice to include them in a test. 


236 Handbook of Pupil Evaluation 
4. Hints for Construction of Short-Auswer Questions 


Short-answer questions, if constructed properly, can be taken 
advantage of in overcoming many of the limitations of the 
essay and objective-type questions. The following points if 
Kept in view while constructing short-anwer questions would 
surely go a long way to improve their quality: (9) 


(a) Use these questions for testing only those objectives which 
are amenable to this type. These tests, for example, may 
be used to test functional information, understanding and 
application of knowledge to a great extent. Even certain 
drawing skills representing specific areas or points can be 
tested by using short-answer questions. However, such 
questions cannot be used profitably to test mental processes 
like ability to organize, marshall facts, develop a prolonged 
and integrated argument for which essay type questions 
are most suited. Almost all specific objectives can be tested 
by the use of these questions as shown in the annexure 
given at the end. У 


(b) Frame questions in such a way that maximum coverage of 
content is ensured. This is very essential to utilise each 
mark allotted to such questions by getting the maximum 
benefit of adequate coverage of content. For example a 
question carrying one mark may be put in either of the 
following forms: i 


(i) Mention one property of oxygen. 
(ii) Mention one chemical property of oxygen. 
(iii) Mention one chemical property of oxygen which is 
different from that of carbon dioxide. 
(iv) Mention one property of oxygen which is different 
from that of carbon dioxide but resembles that of 


hydrogen. 


The first question tests almost nothing as one can write 
simply colourless/odourless/tasteless. In the second question 
content coverage is more than in the first because the 
examinee must know the difference between chemical and 
physical change before answering it. 


Constructed Questions—(Supply Type) 237 


(с) 


(4) 


(е) 


(f) 


(g) 


In the third question there is still more content coverage 
involving two gases. In the fourth question the content 
coverage is still more to cover the properties of the three 
gases. It is this question which ensures maximum coverage, 
utilising just one mark. 

Try to cover 2, 3 or 4 credit points instead of just one, for 


maximising the utility of such questions as illustrated 
below: 


Example 
(i) List one characteristic to distinguish between plants 
and animals. (One value point) 
(ii) List two common features and two contrasting. 
features of plants and animals. (Four value points) 


Limit the scope of the answer to such an extent that it. 
can be attempted within the time limit (1-5 mts.). It is 
better to restrict the response to a. maximum of 4 credit 
points. For example, avoid a question like; 'List six 
characteristics of living organisms’. If however, you are- 


interested to know more, frame a long-answer question or- 
two short-answer questions. 


would expect from the pu 
language and definitenes 
controls the reliability of 


238 Handbook of Pupil Evaluation 
5. Very Short-Answer Questions 


In accordance with the mode of answering and questioning we 
may have the following types of very short-answer questions: 


5.1. Direct Question Form 


A straight question is put requiring a specific answer to be 
'supplied by the students. 


Example 
(i) What is the technical term used for a whorl of petals in a 
typical flower? 
(ii) Who invented the telephone? 


5.2. Selection Form 


А direct question is put with 2 or.3 choices designated in the 
-question itself and the student is to write the answer after 
making a proper selection. The student may be asked to supply 
the answer or cross out what is not required. 


Example 
(i) Which of the two, ‘go’ or ‘going’ is a present participle? 
(8) The founder of Mughal empire was Akbar/Babar. (Delete 
which is not required) 


5.3. Completion Form 


A question is given in the form of an incomplete statement 
-and the student is to fill in the appropriate word or phrase. 


Example 
(i) The first Russian to go into outer space was............ 
(ii) Relationship between a carpel and the ovule is the same as 
between a stamen and the ........ .. 
(iii) In the life history of the frog the tadpole larva resembles 
the fish. The phenomenon indicates that... 
(Ontogeny repeats ‚ phyliogeny) 


Constructed Questions—(Supply Type) 239 


5.4. Identification Variet ty 


In this case a student is asked to choose or underline a parti- 
cular word with respect to the point of relevance indicated in 
the direction of the question. 


Example 
(i) In the following sentence underline the word which is an 
adjective. 


I always prefer to buy a black pen. 
(ii) Which of the two, Arunachal and Goa isa Union Territory? 


5.5. Recognition Variet у 


Inthis case either а Specimen, an apparatus, an object or 
Sketches are given out of which one is to be Tecognised by 
careful observation. 


Example 


(i) Which of the given objects is a thistle funnel? (To be 
pointed) Я 

(ii) Which of the given apparatuses is used for filteration of 
water? (To be selected) 


(iii) Which of the sketches provided, is of Fahrenhiet thermo- 
meter? (To be ticked or underlined) 


6. Suggestions for Construction of Very-Short- 
Answer Questions 


Whichever form is used depending upon the teachers’ prefer- 
hig and students? experience, care must be taken to restrict 
1е response to one word, a phrase or at the most a sentence, 


240 Handbook of Pupil Evaluation 
questions in a question form and students also respond favour- 
ably to such questions as they elicit direct responses. Much 
depends on the construction of the question itself to get the 
desired response. A few suggestions given below would be help- 
ful to achieve this purpose: 


(а) Frame the question in such a way that it has only one 
correct answer. This can be done by requiring a significant 
or key word as the expected response. Otherwise the item 
will not contribute to the objective intended because there 
could be many correct answers with reference to different 


synonyms or near synonyms. 


Example 
Poor : Who wrote the ‘Origin of Species? (Charles) 
Improved: The name of the author of the ‘Origin of 

Species’ was....--- ђ 


In the first case, che answers may be many e.g. an author, a 
biologist, a scientist, an Englishman, an English scientist 
etc. each one of which would be technically correct 
although the desired response is Charles Darwin. 


(b) Use precise and unambiguous language to make the answer 


definite. 
Example 
Poor : An animal which eats the flesh of other animals 
СИНА, ЕС 
(carnivorous) ‘ 
Improved: An animal which eats the flesh of other animals 
is classified as.....-+++ +++ 


In the first question the answer may be. tiger, wolf, meat- 
_ eater etc. which is not the desired response while in the 
latter, the answer is pinpointed. 


(c) Minimize the use of textbook phrases ог stereotyped 
language in phrasing the questions. Such phrasing rewards 
and encourages rote memorisation which is not always 
associated with real understanding and often causes 


Constructed Questions—(Supply Type) 241 
ambiguity. 


Example 

Poor : For every action there is equal: and......... 
(opposite reaction) 

Improved: A peace of stone When struck against a wall 
comes back due to Newton's...............]aw of 
motion. 


ttr 


(d) A direct question form is generally preferable to an in- 
complete statement especially in case of younger students, 


Example 
Poor : The capital of Sikkim is ........ 
Improved: Which is the capital of Sikkim? 


(e) Avoid grammatical and other clues. 


Example 
Poor : The group of tissues form an.........(organ) 
Improved: An organ is formed by a group of...... (tissues) 


In the first case ‘an’ gives clue to the answer ‘organ’ which 
can be associated with the article placed before it, 


(f) Avoid over-mutilated statements involving a number of 
blanks. When too many blanks are left in an incomplete 
statement, the meaning is almost lost. 


Example 

Poor i Tiea by which... ............is Separated 
into various... ........ having different.. 
is known а$.......... 

Improved: The process by which petroleum is separated 
into various components having different 
boiling points is known cT 
(fractional distillation) 


(e) Avoid indefinite statements. 


242 Handbook of Pupil Evaluation 


Example: 
Poor : When was Mahatma Gandhi born? 
Improved: In which year was Mahatma Gandhi born? 


In the first case answer could be the date, month, year or 
even the ‘same day’ as Lal Bahadur Shastri. 
(h) Blanks should be kept uniform in length to avoid any clue 
regarding the length of response expected. 
(i) Blanks should be arranged in a manner convenient for 
scoring. It is better to have blanks towards the right hand 
side of the question or the statement. 


(j) Before scoring prepare a key which contains all acceptable 
answers. 


Example: 


Which method is used for carrying ripe fruits to long 
distances? 


In this case all possible responses like refrigeration, 
preservation by ice, cooling, etc. should be treated as 
correct. 


This variety of short-answer question should be carefully 
framed to restrict the response to the desired extent, otherwise it 
leads to difficulty in scoring. As the response in such questions is 
just one word, a phrase or at the most a sentence, a very careful 
structuring of questions is needed in order to test effectively the 
objectives other than mere factual information. 

This variety of short answer questions i.e. very short-answer 
type questions should be very carefully worded to restrict the 
response to the desired extent otherwise it may lead us to 
trouble while scoring such items. Moreover, unless special care 
is taken these questions tend to test merely recall of specifics, 
rather trivial details like names, dates, events, etc. As the 
responses in such questions is just one word, a phrase or а 
small sentence, a very careful structuring of such questions is 


needed in order to test effectively objectives other than mere 
factual information. 


Constructed Questions—(Supply Type) 243 


ANNEXURE 


Objective-based Short-Answer Questions 
For class VI-VIII 


Concept 


For germination of seeds, air, water, and suitable temperature 
are necessary. 


Illustrative questions testing different abilities 
1. Knowledge Objective 
1.1. Recalls 
List three conditions necessary for germination of seeds. 


1.2. Recognises 


In the absence of which of the following conditions can a seed 
germinate? 


(1) Moisture, (ii) Temperature, (iii) Oxygen, (iv) Soil (V.S.A.) 
2. Understanding Objective 

2.1. Translates 

Define the term ‘germination’ in your own words. 


2.2. Illustrates 


Give one example of a seed that requires more moisture for 
germination and one requiring comparatively less moisture. 


2.3. Identifies Relationship 


Why does a seed not germinate when sown too deep? 


DN Handbook of Pupil Evaluation 
2.4. Compares 


In what two respects does germination of a gram seed differ 
from that of a castor seed? 


2.5. Classifies 


Categorise the following seeds into two groups on the basis of 
their mode of germination. 

A. Pea C. Castor E. Sun Flower 

B. Gram D. Maize F. Onion 


2.6. Detects Errors 
Observe the given diagram—B of the experiment to show the 


conditions necessary for germination and point out the mistake 
if any. Suggest rectification also. 


2.7. Interprets ` 


What does this experiment demonstrate? 


Fig-A 


2.8. Explains 


Why do we use three seeds instead of one in the above 
experiment? 
(Figure—A Question 2.7) 


3. Application Objective 
3.1. Analyses 


Some quality seeds of wheat were sown in two separate pots— 
A and B using the same quality seeds and same sample of moist 
Soil and kept at the same place. Seeds in pot-A gérminated but 


not in pot-B. Under what condition could this result be 
relevant? - : 


Constructed Questions—(Supply Type) 245 
3.2. Hypothesises 

When mustard seeds are scattered on suitably prepared soil, 
they germinate but maize seeds do not. When sown a little deep 


under the same soil maize seeds germinate but not the mustard 
seeds. Propose a hypotheses to explain these observations. 


3.3. Suggests Procedure 


Bigger seeds required more moisture than smaller seeds for 
germination. Suggest an experimental procedure to test this 
hypothesis. 


3.4. Gives Reason 


Why do seeds generally not germinate when there is a heavy 
rainfall immediately after sowing? 


3.5. Draws Conclusion 


What conclusion can you draw from the observations of experi- 
ment shown in figure-C, (Question 2.7). 


4. Skill Objective 
4.1. Draws Sketches 


Draw a labelled diagram to show the experimental set-up of 
the three bean experiment. 


4.2. Fits up Apparatus 


Examine the diagram given on the black board and fit up the 
apparatus to demonstrate the three bean experiment. 


4.3. Observes and Records 


Examine the three diagrams carefully. Note and record the 
difference in the conditions of germination available in the 


three cases , 


СНАРТЕК Х 


OBJECTIVE-TYPE ITEMS 
(MULTIPLE CHOICE TYPE) 


1. Introduction 


Of late objective-type items have become quite popular in India, 
Partly because of their inherent objectivity in Scoring but more 
SO perhaps due to the increasing demand of such items in 
Selection tests. Inspite of the fact that these objective-type. 
tests are increasingly used in most of the entrance examinations. 
for admission to medical, engineering, agriculture, commerce: 
and other professional courses, the quality of such tests is far 
from satisfactory. Leavingaside a few specialists or trained 
evaluators, there is a dearth of good item writers. For 
classroom teachers, the use of such tests is still a wishful think- 
ing because of two reasons. Firstly, the teacher by and large 
are not thoroughly trained in the technology of objective 
questions. Secondly, the use of such tests demand a lot of 
facilities like stencilling, duplicating of paper, stiching and 
administration which are generally not available. Nevertheless, 
once such tests are developed, cyclostyled or printed they may 
be used over а period of time as they can be kept confidential 
after every administration. 

This chapteris devoted to one of the mostly commonly 
used farm of objective-type items, namely, the multiple choice 
type. Attempt is made that the study of this chapter enables the 
Teader to 


Objective-Type Items (Multiple Choice Type) 247 


(a) describe the nature, scope and characteristics of various 
forms of multiple choice questions, 

(b) construct various forms of multiple choice questions test- 
ing different abilities implied under various objectives, 

(c) detect various tppes of clues and other defects in the given 
items of multiple choice variety and rectify them, 

(d) identify the basis of improvement of defective items, 

(e) explain the merit and limitations of different form of 
multiple choice items, 

(f) apply the principles underlying in developing stem and 
responses while constructing multiple choice questions, 

(g) judge the quality of a given item using prescribed criteria, 

(h) appreciate the role of multiple choice items in improving 
the validity and reliability of the measuring instruments. 


These tests overcome the criticism levelled at essay questions 
which have poor content-sampling, unreliable scoring, and are 
time consuming to score and amenable to bluffing and other 
extraneous factors. Objective-type items ensure more adequate 
coverage as many questions can be asked in the same testing 
time resulting in higher reliability and more content validity, 
besides ease of scoring and interpretation. Some critics, 
however, contend that objective type items encourage rote 
memory and emphasise testing of factual information because 
the higher mental processes cannot be effectively measured 
through such tests. This, of course, is an erroneous belief and 
a misconceived notion. Even if the scruitiny of such item-pools 
reveals poor kind of items, it only reflects our bankruptcy in 
framing higher level items, testing abilities like application, 
analysis, synthesis or evaluation rather than the inherent 
limitation of the objective-type items. Discussion in the pages 
that follow would amply support my contention that if the item 
writer understands the purpose and technology of an item he 
can, with practice, frame items which ате really thought- 
questions requiring students' ability to think. 

The term ‘item’ is usually preferred to ‘question’ in case of 
objective-type items. Since every question need not be in a 
question form i.e. followed by a sign of interrogation but can 
be set even in the form of a complete or incomplete statement, 


248 Handbook of Pupil Evaluation 


itis more appropriate to use the term item in case of the 
objective-type question. What is important is that objective- 
type items must have a precisely-worded, predetermined correct 
response whatever instructional objective it tests ог the format 
it takes. Thus scoring of such items can be done by any person 
without a knowledge of the subject. It may even be scored by 
а machine as it is done now in some of the public examiantions 
conducted by various boards of school education in India. 
There are many forms of objective-type tests depending 
upon the type of stimulus given and the type of responses 
provided within the item. A detailed classification of objective- 
type items is given here-in-after. 


MULTIPLE-CHOICE ITEMS 


As indicated in the classification of objective-type questions, 
the multiple-choice items are based оп response-directed 
stimulus in which responses or options may be arranged in 
different ways. The choice of the Correct answer among the 
given options or responses can be made from the independent 
Set of responses given for each item or from the same set of 
responses given for different items. Accordingly, we have 
different varieties of multiple choice items in terms of design or 
format of the items as given in the classification flow-chart in 
the beginning. 


Format 
The format of multiple-choice items differs from type to 
type. Since basically all objective-type items (selection type) 
аге multiple-choice items, the variation is in the manner in 
which the stimulus is provided and the responses árranged, as 
also the mode of responding or writing the answer. That is 
why the design of each type of multiple-choice item as also the 
characteristics and hints for construction are discussed 
separately. In this chapter we take the traditionally-used 
multiple choice variety. 

An ordinary multiple-choice item consists of an incomplete 
Statement or a question called the stem Of the item followed 
Usually by four or five suggested answers (choices, called the 


g 


249 


Objective-Type Items (Multiple Choice Type) 


ungui 

Yopodug '2 

Buyu 
ДЕДЫ 


| 


Sunpyeur -yew 
Punodwoy — ajdwig 


sd 


1uouodu БИ 
EU Suryoreyy 
sua | 
poseq osuodso1 
oouonbos шоу 


Bl 


gu pono 10)u] 


151291503 
рипойшо? @ 
Т 
sarseu ојдшт "T 


sonouma зоо “ZT 
9suodso15neureidei] [1 
asuodsaz опом “OL 


Paseq uorssaidxo 
-[еойтшәцюрү $ 
2 poseq-umiseiq ^p 


osuodsoi]v `6 poseq-iuoumadxz “g 
у osuodso12Anv3oN '8 Pposeq-ewq “z 
151 9suodso1 рәәиәпЬә$ ‘4 poseq-o3esseq ‘ү 
зәтш үт 9suodso1 рошашој “9 А 
punod SEN oor ој шозит s 
-uoj) 20ш5 э, poininsqns ^t 
7 || osuodso11sog "€ 
swan жш эп · ae 
Коу su 3suodso: o[Sutg "р 
eh Soy uap 9 Ex 
р: dies 1 эмшюәгу 
чашод -шәфәрщ adá) axoqo adf 21042 ој 
adumu 
feamonyg 
к= т f sninuns sninums 
Se[nums semuns запише. sn[ntuns iod Ta 
Че [eqn yearydesg © [шәл poseq poseq 
uo uo uo uo suo Em 
sumi рога _ poseq poseq poseq poseq 
suo sui suo sumi 
30 YII Jo Јоцо yeI Jo ways шш oun. was шеш 
yuəpuədəpu! шә quapuodapur suwy oi post su apo 


RT 


SUDI 1иозојир 
ш ә$п 10) suondo Jo 125 oues 


Re 


wa}! qe 10} suondo jo 125 5je1edos 


sn[nuns ројзолр asuodso1 uo poseq sway] 


t 


роде) sua 


E ~ 


Kou 21025- роу аду 
Крид 42 LO AL 
KyourA paseq weerd 
АүдирА 122]}}2-әзпвгу 


Kuea ALL ^ 

KyouA [njiqnop-osqn-9na T, 
AjauRA JASN, 

AyaurA иопзәлзогу 


əaneu aneu 
aye -әце 
10015000 10018000). 
Коя | 
ој плу әүйш< 
T "n 
решено рәшәпс 
sn[nums snpnums 
aiqnod э@ш$ 
Joyo. 
youa jo 
1uopuodopug 
sura 


popi^o1d suondo ојезед25 ON 


^ 


snjnums ројээлр 1UDWAIEIg uo poseq шоу 


i 


SWIL AdAL JALUJAGO 


————À 


"6 
W 
75 
19 


PEETI 


250 Handbook of Pupil Evaluation 


responses or options) out of which usually one is correct. The 
correct answer is called the key and the incorrect answers are 
called the distractors, of course all being plausible. The responses. 
are usually lettered as A, B, C, D, E, and the student 15 
required to choose the correct letter that represents the correct 
Tesponse і.е. key. —— 


Example 
Which of the following towns is a state capital? ] Stem 
A. Allahabad ^| ү 
E M t | options or responses | Distractors 
D. Bhopal Ј ] Key 


A common multiple-choice item is usually written material 
but it may be based on a diagram, picture, graph, table or a 
map. Items may be independent or tagged to the same stem 
or introductory material in which case they are termed as 
Structured questions. A more detailed description of these will 
be found towards the later part of this chapter. 

As already stated, an item сап be given in the form ofa 
question or an incomplete statement followed by responses. 
The two varieties are given below followed by guidelines 
indicating the mechanics of writing such items. 


Question form Incomplete statement form 
Which of the following instru- For measuring the altitude of 
ments is used to measure à place we use 


altitude of a place? 


Q. 1. 

А, Thermometer A. thermometer. Г] 
*B. Barometer B. barometer. © 
C. Hydrometer C. hydrometer. П 
D. Lactometer D. lactometer. — [] 


1 (B) 


Mechanics of Writing 
1. The first letter of the word of each alternative is written in: 
small capital if the stem is incomplete, unless this word is а. 


Objective-Type Items (Multiple Choice Type) 251 


= 


10. 


ргорег поџп. 
Each alternative is followed by а full stop in сазе of the 
incomplete stem. 


The first letter of the word of each alternative is written in 
capital when the stem is in a question form. 

Write alternatives below the stem a little away (two spaces) 
from where the stem begins. 


. Underline or capitalise words like ‘not’, ‘best’, ‘except’, 


‘least’ etc. when used in the item stem. 

Use Arabic numerals for serialing the item. 

Use capital A, B, C, D, and E for numbering the responses. 
Specify the place either by a blank—— or a box O or by 
а bracket ( ) for indicating the response, towards the 
right hand after the last alternative and number the 
bracket by the corresponding serial number of the item like 
5( ). Sometimes it is advisable to put boxes or brackets 
against cach of the options. 

Mode of indicating the response c.g. (B) ог ® may be 
specified. For lower classes sometimes encircling of letter 
against correct answers is recommended. 

For the same test keep the same number of options (4 or 5) 
forall the items. Four seems to be more practicable. 


Arrangement of items 


1. 


2 


Group different types of items according to number of 
choices provided in different items. 


Check whether correct responses in items are placed at 
random and not in some definite order to avoid clues. 


See that all alternatives of the same question remain on the 
Same page while writing for a test. 


To save space, all alternatives may be written in a single 


Tow if they are small, or in two rows if needed, in classroom 
testing. 


Provide common directions for similar types of items, e.g. 
Single correct response, multiple correct response, re-arran- 
gement type. . .etc. 

POTENTIALITIES OF MULTIPLE CHOICE ITEMS 


The very fact that in most selection tests only multiple- 


252 Handbook of Pupil Eval uation 


choice tests are used, bears testimony to the inherent qualities 
of these items. Though it is not easy to frame good questions 
of this variety testing the higher abilities, yet with good practice 
and knowledge of the technology of developing such items, it 
Should not be very difficult to frame questions testing almost 
all the cognitive abilities envisaged under Knowledge. Under- 
standing and Application objectives as enunciated in the chapter 
on educational objectives. It is an erroneous belief that such 
items cannot be set to test higher level objectives. Following 
examples of objective-based items would amply support this 
view point. To avoid subject-based content difficulty, the 
content is taken from the pedagogy itself relating to the form of 
questions and the objectives. 


ILLUSTRATIVE ITEMS TESTING DIFFERENT ABILITIES 
1. Knowledge Objective 
1.1. Recognises| Recalls 


Which of the following forms of question is of the free response 
type? 


*A. Completion type 
B. Multiple choice 
C. True-false type 
D. Matching type 


2. Understanding Objective 


2.1. Translates 


Which one of the following terms is closest in meaning to the 
term taxonomy? 


А. Objective 
B. Construction 
*C. Classification 
D. Evaluation 


Objective-Type Items (Multiple Choice Type) 253 
2.2. Cites Examples 


Which of the following abilities illustrates the -objective 
*Comprehension' according to Bloom's taxonomy? 


A. Comparison 
B. Classification 
C. Prediction 
*D. Extrapolation 


2.3. Identifies Relation ship 


If the number of questions in an objective-type test is reduced 
from 60 to 40 which of the following characteristics of the 
test would NOT be affected? 


A. Validity 

B. Reliability 
C. Practicability 
*D. Objectivity 


2.4. Compares 


Which of the following characteristics of a question is common 
to both essay and short-answer type questions? 


A. Fixed response 

B. Extended response 
*C, Freedom of response 
D. Selection of response 


2. Classifies 


Which of the following pairs of objectives is NOT grouped in 
order of hierarchy of Bloom’s taxonomy? 5 


А. Knowledge and Comprehension 
B. Comprehension and Application 
C. Analysis and Synthesis 

*D. Analysis and Evaluation 


254 Handbook of Pupil Evaluation 


2.6. Detects Errors 


In terms of Bloom’s taxonomy, in which of the following sets 


of abilities, arranged in order of complexity did a student make 
the mistake? 


А. Knowledge of Specific facts, principles, theories 
*B. Interpretation, translation, extrapolation 
C. Analysis of elements, relationships, organisational 
principles 


D. Production of unique communication, plan, derivation of 
abstractions 


2.7. Interprets 
If fa true- 


number of 
the same 


false test is to replace a multiple-choice test the 
items need to be greater in order to maintain almost 


A. Level of difficulty 
*B. Degree of reliability 
C. Coefficient of validity 
D. Extent of objectivity 


2.8 Explains 


In an €ssay-type test, we should not permit overall options 
(Any 5 out of 9 type) because that ensures much better 


A. Content coverage 
B. Coverage of objectives 


C. Objectivity in scoring 
*D. Comparison of students’ performance 


3. Application Objective 
3.1. Analyses 


In a multiple-choice test with four alternatives, the students can 


Objective-Type Items (Multiple Choice Type) 255 


attempt 25% items by guessing alone. Which one of the 
following assumptions the author is making? 


A. That the students would make some informed guesses 
*B. That the students would make blind guesses 

C. That the students would make guesses in 25% questions 

D. That 25% students would make blind guesses 


3.2. Hypothesises 


In Bloom's taxonomy, the educational objectives are arranged 
in increasing order of complexity (hierarchy) i.e. Comprehension 
is more complex than Knowledge; Application more complex 
than Comprehension and so on. Moreover, each higher-level 
objective subsumes the next lower level objective i.e. 
Evaluation includes Synthesis which includes Analysis that in 
turn includes Application and so on. From this we can 
hypothesise that factor analysis should exhibit 


*A. 1 unique ability, 1 common ability and 4 group abilities 
B. 2 unique abilities, 5 common abilities and 6 group abilities 
C. 2 unique abilities, 4 common abilities and 5 group abilities 
D. 1 unique ability, 6 common abilities and 1 group abilities 


3.3. Suggests New Procedures|Alternatives 


An objective test consisting of 50 True-False items was found 
to |ћаме a low reliability of 0.50. Which of the following 
measures is likely to increase its reliability the MOST? 


A. Convert all the 50 items into completion type 
*B. Convert all the 50 items into multiple-choice type 

C. Increase the number of items to 60 

D. Reduce the number of items to 40 


3.4. Gives Reasons|Establishes Relationships 


A blue-print ofa question paper is a three dimensional chart 
in which distribution of questions and marks of various forms 


256 Handbook of Pupil Evaluatior 


of questions testing different objectives relating to different 
content area are shown. That is why a good blue print 3 
ensures better validity because of adequate sampling of | 


A. Content, elements, in each unit 

B. Instructional objectives 

C. Questions relevant to unit-objective 
*D. АП the above factors 


3.5. Concludes|Infers 


An 'Application? item measures ‘Understanding’ and ‘Know- 
ledge’ objectives also, but an item testing Understanding does 
not test ‘Application’ but does test ‘Knowledge’ objective. A 
‘Knowledge’ item neither tests the Understanding nor the 
Application objective, From this the best conclusion that can 
be drawn is that objectives are 


A. Arranged in hierarchical order but are not cumulative | 
B. Cumulative in nature but not hierarchically arranged 

*C. Arranged in hierarchy and are cumulative also 
D. Neither hierarchical nor cumulative in nature | 


3.6. Predicts 


Supposing a teacher teaches biology for three objectives namely, 
Knowledge, Understanding and Application. Ifin a unit test 
consisting of 50 multiple-choice items testing Knowledge and 
Understanding, he includes 5 more items based on the Applica- 
tion Objective, what would happen to the validity апа 
reliability of the unit test? 


A. "Validity would decrease but reliability would increase 

B. Reliability would decrease but validity would increase 
*C. Validity as well as reliability would increase 

D. Reliability as well as validity would decrease | 


З.Л. Judges (evaluates) 


Essay type tests overrate the importance of ‘How to say a 


Objective-Type Items (Multiple Choice Type) 257 


thing rather than having something to say’. This observation 
can be justified if we believe that essay type tests are 


A. More reliable than objective-type tests because they ensure 

better coverage of content through long answer questions 
*B. Less reliable than objective-type tests because validity of 

students’ responses is affected by extraneous factors 

С. More valid than objective-type tests because they ensure 
better the validity of responses 

D. Less practicable than objective-type tests because they are 
difficult to score and interpret. 


VARITIES OF MULTIPLE CHOICE ITEMS 


1. Single-response Variety 


This is the most commonly used form of multiple-choice items 
which are usable in classroom testing, public examinations and 
selection tests. These items have 4 or 5 alternatives with one 
and only one correct answer. 


Example ' 
Which of the following enzymes is produced from the salivery 
glands during digestion of food? 


A. Trypsin 
*B. Ptyalin 
C. Pepsin 
D. Erepsin 


2. Multiple-response Variety 


In this type, more than one answer may be correct. The 
choice of the correct answer is, therefore, not limited to one 
key but may be more than one. A specific direction is therefore 
necessary for the examinees requiring them to identify more 
than one Key. Such items are difficult to score and use in 
public examinations but they are Very good for use in unit 
testing to know students’ real understanding of the content- 


258 н Handbook of Pupil Evaluation 


elements and provide for good training in improving their 
discriminating power. 


Example А 
Which of the following plants сап be classified as hydrophytes? 


A. Cuscuta 
*B. Lotus 

C. Calotropis 
D. Hydrilla 


3. Best-response Variety 


In this case more than one could be the correct answer or all of 
them may be correct. But the onethat describes or explains 
best the concept, principle or the idea, is considered the correct 
answer. Such questions test students' ability to discriminate 
between closely-related concepts. In such questions, the direc- 
tion must be clear and the word ‘best’ be underlined or 
capitalised as shown below: 


Example 
Which of the following chemical equations explains BEST the 
mechanism of photosynthesis? 


А. 6CO»4-6H20 = CcH12064- 605 

B. 6CO2+1 2H20=C6H1206+6H20+602 

C. 6CO2+12H20!6 = C6H1206+ 6H20+ 60?'6 
*D. 6CO2+12H20'8= C6H1206-+6H20 + 60518 


4. Substituted-response Variety 


In this type the blanks are left to be filled-in by the 
examinee on the basis of options or responses provided after 
the statement. Such items are useful when the content of learn- 
ing demands the use of the same type of responses to every 
teaching point. In such cases, the same alternatives may be 


Used for filling up the blanks by substituting the relevant 
alternative. 


Objective- Type Items (Multiple Choice Type) 259 © 


Example 
Deficiency of vitamin—A causes (Night-blindness) 
While lack of vitamin—C causes (Scurvy) 


A. Anemia 
B. Beriberi 
С. Scurvy 
*D. Night-blindness 


5. Incomplete-response Variety 


In this type, the stem ofthe item is followed by the four or 
five options which are given in the form of incomplete responses. 
The testee is given hints in the form of limits of letters, 
incomplete word or a single letter. Such items can only be used 
profitably in testing vocabulary, terminology, spellings etc. 


Example 

(a) One who is deathless, everlasting and indestructible is 
called: 
А. о *B. i C. d D. s (*immortal) 


(b) An oval figure is termed as 
‘A, Сй Bi Су, Стон“ *D. е]...... (elliptical) 


(c) The spelling of the word ‘rec-pt’ will be correct if the two 
blanks are filled up using. 


ка. ei B. ee C. ie D. ii 
6. Combined-response Variety 


Sometimes when one is interested in judging more thorou- 
ghly about a concept, a combination of responses may be used 
as key in place of the single-response key. 


Example (uncombined key) Example (combined key) 
Lack of vitamin А causes Source of Vitamin А is 


260 _ Handbook of Pupil Evaluation 


A. anaemia a. Liver b. Butter 
B. beribery _ c. Milk d. Grains 
C. scurvy A. a only 
*D. night blindness B. a+b 

*C. at+b+c 

D. a+b+c+d 


7. Sequenced-response Variety 


This type involves responses which depict some sequence of 
items, objects, procedure, events etc. Such items ensure lot of 
coverage of content besides testing students’ ability to identify 
relationships and classify facts, concepts, events, etc. Since such 
items are based on a very comprehensive and broad chunk of 
Syllabus, they are very economical per unit testing time. How- 
ever, such items сап be framed only on such content areas as 
involve sequential matter. 


Example 
Which of the following sequences respresents the evolutionary 
development of animals? 


A. Snake—-> Frog—— Pigéon—-— Cow 
B. Frog—~>Pigeon—-» Cow—> Snake 
C. Pigeon—— Cow—~>Snake——>Forg 
*D. Frog——>Snake—->Pj geon—-— Cow 


8. Negative-response Variety 


Sometimes it is difficult to find four homogeneous cog ата 
but three are possible. In order to use these three, pn 
stem is made negative in order to accommodate the four 


_ alternative, which is not related, as the desired key. 


Example \ . 
Which of the following is NOT necessary for germination of a. 
seed? 


Objective- Type Items (Multiple Choice Type) 261 


A. Air 

B. Water 
*C. Soil 

D. Warmth 


9. All or None-response Variety 


Sometimes it is difficult to write an item because all the 
responses are good as an answer or key but one finds difficulty 
in identifying distractors. In such cases one may use the fourth 
(last) option as ‘all the above’. Likewise, if one is not able to 
find even a single good alternative relevant to the situation 
depicted in the stem, then relevant distractors are chosen as 
alternatives and the fourth (last) option may be listed as *None 
of the above’ as illustrated below: 


Example 
which among the following phenomena is based on the princi- 
ple of air pressure? 


A. Working of water pump 
B. Filling in of ink in a fountain pen 


C. Working of a siphon 
*D. АП the above 


Which of the following phenomena reflects the principle of 
Archimedes? 


A. Rising of water in a straw pipe when sucked in 
B. Working of a siphon 
C. Working of a water pump 

*D. None of the above 


10. Multiple Selection/Completion Variety 


In this type, four choices are given as А, B, C & D and the 
key may comprise more than one of these responses. The 
student is required to find the correct answer in terms of the 
stem of the item, the answer being only A or B or C or D, each 


262 Handbook of Pupil Evaluation 


having more than one response combined under each. These 
items can be used when we want to test students’ ability to cite 
examples, compare, discriminate or analyse a situation involv- 
ing allied concepts or ideas. 


Example 


Which of the following biologists has worked on the theory of 
organic evolution? 


1. Darwin 2. Lemarck 3. Mendel 4. Wallace. 


Select: 
A. if 1, 2 and 3 are correct 
B. if 1, 3 and 4 are correct 
C. if 2, 3 and 4 are correct 
*D. if 1, 2 and 4 are correct 


11. Worst-answer Variety 


In such items, all the four or five alternatives are relevant 
except the one which is the worst. Students are asked to choose 
the worst and write its number in the space on the right. Such 
items can be used to test students' ability to detect errors, 
irrelevance and incongruency in statements. 


Example 
A rhombus is a geometrical figure in which 


A. all sides are equal 
*B. all angles are equal 

C. opposite sides are equal 
D. opposite angles are equal 


12. Most-inclusive Variety 


Four or five words are given in a set, one of which includes 
the other three or four. A Student is to select the word which 
is the most inclusive i. e. covering the others and write its serial 
number іп the space provided on the right. Such items can be 


Objective-Type Items (Multiple Choice Type) 263 


used to test vocabulary, terminology, relationship, similarities 
etc. 


Example 
(a) A. Beribery B. Typhoid C. Disease D. Malaria C 
(b) A. Spirogyra B. Ulothrix C. Chlamydomonas D. Algae D 


13. Most-dissimilar Variety 


A set of four words or terms are given, one of which does 
not belong to the other three. Students are to choose the most 
dissimilar one and write its serial on the space provided on the 
right. This type is useful to test students’ ability to identify 
relationships and classify on the basis of implicit criterion. 


Example 

1. A. Cuttle fish B. Star fish С. Shark fish D. Devil Fish С 
2. A. Jahangir B.Altmash С. Akbar D. Babar B 
3. A. Hard B.Last С. Fast Р. Cast D 


14. Cause-effect Variety 


In this case events representing causes (or effects) are given, 
out of which one is the effect (or cause) respectively. Students 
are required to select the effect (or cause) and write its serial in 
the space provided on the right. Such items can be used to 
emphasise (in instruction) students' ability to establish the 
cause and effect relationship. 


Example 
1. Which of the following events is an effect of the other three 
causes of the downfall of Mughal empire? 


A. Weak successors after Aurangzeb 

B. Religious policy of Aurangzeb 

C. Faction fighting among nobles 
*D. Rise of independent kingdoms 


264 Handbook of Pupil Evaluation 


2. Which of the following events is the cause of the other threc 
effects? 


A. Rise of independent states 
*B. Lack of political insight of Aurangzeb 
С. Consolidation of Marathas in Deccan 
D. Invasion of Nadir Shah to loot Delhi 


15. Analogy-type Items 


In this case three words, terms or concepts are given, show- 
ing a relationship between the first and the second on the basis 
of which the fourth word is to be identified out of 4 or 5 
given as alternatives. 


Example 
Identify the fourth term in the following items and indicate by 
encircling the serial number of the correct response. 

(i) Sun : : Mercury : : Earth: -————— 


A. Jupiter 

B. Mars 

*C. Moon 
' D. Venus 


(ii), Gujarat: Ahmedabad : : Manipur:——-—- —— 


A. Dispur 
B. Shillong 
*C. Imphal 
D. Kohima 


16. Multiple Completion/Selection Variety 


In this case, the stem.of the item is followed by options 
having one or more than one correct response. A student is 
Tequired to select the correct response or responses. However, 
to facilitate scoring, the responses are coded for various combi- 


Objective- Type Items (Multiple Choice Type) > ~ 265 


nation of responses so that each item has only one lettered 
option as its key. 


Example 
Indian Prime Ministers since 1947 include 


J. Lal Bahadur Shastri 
If. Radhakrishnan 
III. Charan Singh 
IV. Morarji Desai 


Select 
A. Land II only A. if 1 & 2 are correct 
B. II, III and IV only OR B. if 2, 3 & 4 are correct 
*C. T, III and IV only *С, if 1, 3 & 4 are correct 
D. I, II, III and IV D. if 1, 2, 3 & 4 are correct 


OR when each item consists of 4 options arising from the three 
numbered responses the code may be: 


Select A. if 1, 2 and 3 are correct 
B. if 1 and 2 only are correct 
C. if 3 only is correct 
D. if some other single response or combination of res- 
ponses is correct 


Here one code is left unspecified (D) and the student must 
show whether each option considered on its own is or is not 
the correct answer. It is therefore a more thorough probing, 
allowing forthe least guess work. Such items lead to muddy 
measurement sometimes if the aim is not clear. If the intention 
is to find out whether а student knows which of the possible 
answers are correct and which are not correct then such items 
can be used more profitably. 


17. Assertion/Reasoning Variety 


Such items are based on two statements, the first being 
coded as ‘assertion’ and the second as ‘reason’ and linked by 
word ‘because’. In fact such items provide a technique for 
converting true-false items into the multiple-choice format. 


266 Handbook of Pupil Evaluation 


Example 


. An iron needle sinks in water while a toy ship having the same 
weight floats. (Statement 1) 5 


because 


the volume of water displaced by the needle is more than the 
weight of the needle. (Statement 2) 


Select A. if both the statements are correct and the second is 
the explanation of the first. 
' *B. if both statements are correct but the second is NOT 
the explanation of the first. 
C. if the first statement is correct but the second is 
incorrect. 
D. if the first statement is incorrect but the second is 
correct. 


Such questions more often test students’ understanding of 
concepts involving higher level abilities and not mere recall. 


18. Diagrammatic-response Variety 


In this case, instead of verbal responses, diagrams may be 
given as responses (options) and students are required to select 
the correct diagram (key) out of 4 and 5 given. Such items. 
are useful to test perceptual ability along with functional 
understanding. 


Example 


In which of the following diagrams are all the sides as well as. 
the angles equal? 


em 
Кы 
47 


Objective-Type Items (Multiple Choice Type) = 267 
19. Diagram-based Response Variety 


In this casea diagram, table, chart or a picture is given and 
verbal responses are provided based on the given diagram. 
Such items can be used to test various abilities depending upon 
the nature of graphic stimulus used. 


Example 
In the given electronic diagram of an animal cell the organelle 
which is concerned with respiration is indicated by the label 


А. 1 B. 2 же. 8 р.4 


Animal Cell 


SUGGESTIONS FOR. CONSTRUCTING 
MULTIPLE-CHOICE ITEMS 


A. General 


(i) Develop multiple choice items on the basis of independent 
meaningful and important ideas. 


Example 
Poor: Which of the following formulae is used to estimate the 
reliability of a test by the split-half method? 
A. Spearman-Brown 
B. Chronbach 
C. Kudar-Richardson 
D. Gutman 


Better: Spearman-Brown formulae is useful in estimating the 
reliability when 


A. the same test has been given twice. 


268 Handbook of Pupil Evaluation 


B. the only available data are the number of items in the 
test and the mean and standard deviation ofthe 
Scores. 

C. one knows the number of items in the test. 

*D. a coefficient of correlation between scores on odd and 
even numbered items in a test has been calculated 


(ii) Item should be strictly true 


Example 
Poor: Why is blood · plasma often preferred to whole blood for 
- transfusion? 


A. Whole blood may carry disease germs. 
*B. Whole blood must be “typed” to match the blood of 
the patient. 
C. Plasma can be prepared synthetically. 
D. Plasma contains more disease finding white corpus- 
cles than whole blood. 


It is not correct the blood plasma is preferred to whole 
blood, because if whole blood of the proper type is available, 
itis usually preferred. The stem should ask ‘what advantage 
does blood plasma have over whole blood’ because it is true that 
plasma need not be typed. 


(iii) Frame questions using definite and concise language to get 
the intended response without further qualifications 


Example 
Poor: Which one-of the following is the best source of heat 
for home use? 


*A. Coal B. Electricity C. Gas D. Oil 


Better: In the northern districts of Himachal Pradesh which 
one of the following is the most economical source of 
heat for home use? 


*A. Coal B. Electricity C. Gas D. Oil 


Objective-Type Items (Multiple Choice Type) 269 
(iv) Language of the item should be comprehensible 


Example 
Poor: A chaotic situation means 


A. Asymptotic *B Confused C. Gauche 
D. Permutable 


Better: A. Рићу *B. Confused C. Tactless 
D. Interchangeable 


(v) Avoid basing items on unique organisation 


Example 
Poor: The first characteristic of a living organism is its 
j ability to 


A. respire B. reproduce C. grow D. move 
Better: The first Mughal King was 


A. Akbar *B. Babar C. Jahangir D. Humayun 


(vi) As far as possible write an item in a positive question form 
rather than in a negative and incomplete form 


(vii) Avoid interlocking of items 


Example 
Poor: A seed can germinate in the absence of 


A. Air УВ. Light C. Moisture D. Warmth 


Better: Air is essential for germination of a seed because it pro- 
vides to the growing embryo 


А. Oxygen В. Carbon dioxide С. Nitrogen 
D. Hydrogen 


If both these items are used in the same test, the second one 
provides a clue to the first. 


270 Handbook of Pupil Evaluation 


B. Stem of the Item 


The following criteria may be observed for writing the stem part 
of the item: 


(1) Task must be set in the stem itself 
Poor: Essay type questions 


"А. have full freedom of response. 

B. are free from subjectivity in Scoring. 
C. are more reliable than objective type. 
D. are easy to grade. 


Better: Essay type questions are characterised by 


А B oss Gx: Diss 


(ii) Pose a single central problem as far as possible 
Poor: Scurvy and night-blindness ате caused due to the 
deficiency of 


#1, Vitamin-A 2. Vitamin-B 
*3. Vitamin-C 4. Vitamin-D 


Better: Scurvy and night-blindness are caused due to the 
deficiency of vitamins 


l. AandB 2. Вапа С 3.CandA  4.CandD 


(iii) Include as much part of the item in the stem as possible to 
reduce the reading load (5) А 
Poor: Which is the best definition for a vein? 


A. А blood vessel that carries blue-blood. 
*B. A blood vessel that carries blood going to the heart. 
C. A blood vessel that carries impure blood. 
D. A blood vessel that carries blood away from the 
heart. 


Better: A vein is a blood vessel that carries 


Objective- Type Items (Multiple Choice: T: ype) 271 


A. blue-blood 
B. impure blood. 

*C. blood going to the heart 
D. blood away from the heart 


(iv) Avoid Using a negative stem 45 far as possible 


Poor: In the definition of a mineral which of the following 
Properties is лог relevant? 


A. It is Produced by geologic Processes, 

B. It has distinctive Physical Properties, 

C. It contains One or more elements, 
*D. Its chemical composition is variable, 


Better: In the definition of а minera] Which of the following 
properties is relevant? 

A. It is Produced by the Sedime 

has distinctive chemic 

C. It Contains two ог more 

*D. It has а varied chem 


ntary Process, 

al properties, 

elements, 

ical Composition, 

(V) Avoid double negatives in the stem 

Poor: Questions Which are not of the supply type Cannot be 
Classified as 


A. True-false type 


B. Multiple choice type 
"С. Еѕѕау їуре 


D. Matching type 


(vi) Avoid Window-qy. 


essing (irrelevant т 
Oor: Gill i 


Н blindness, His family 
ctor advised him to take More milk dail This 
А У 
due to fact that milk Contains : У 


(ЈИ Vitamin.A 2 Vitamin.g 3. Vitamin. 
М Better: Milk is useful to а рабе; 


E 


G 4, Vitamin. py 


nt of Night-blindnesg because it 


272 Handbook of Pupil Evaluation 


provides. 
+1. Vitamin-A 2. Vitamin-B 3. Vitamin-C 4. Vitamin-D 


(vii) Avoid stereotypes in the stem 
‘Poor: For every action there is an equal and opposite reac- 


tion. Which of Newton’s Laws does this statement 
refer to? 


A. First B. Second C. Third D. None of these. 


Better: When a ball is kicked against a wall it comes back 
immediately. Which of the Newton's Laws does this 
example illustrate? 


A. First. B. Second С. Third D. None of these 


(viii) Avoid asking for opinion instead of intended outcome 

Poor: What do you consider is the most appropriate form of 
questions for testing student's ability to comprehend a 
passage? 


*A, Short-answer type 
B. Multiple-choice type 
C. Matching type 

D. True-false type 


Better: Which of the following forms of questions is most suita- 
ble for testing students ability to comprehend a 


passage? 


*A. Short-answer type 
B. Multiple-choice type 
C. Matching-type 
D. True-false type 


(ix) Provide for all necessary qualifications in the stem a 
бог What change would occur in the composition of air in 
à room in which green plants are gro wing? 


Objective-Type Items (Multiple Choice Type) 273 


A. Carbon dioxide increases and Oxygen decreases. 
*B. Oxygen increases and Carbon dioxide decreases. 

C. Both Carbon dioxide and Oxygen decrease. 

D. Both Carbon dioxide and Oxygen increase. 


Better: What change would occur in the composition of air in 
a lighted, air tight room in which the only living things 
growing are green plants. 


А. as Be Co s. D. (6 


(x) Avoid instructional aids 

Poor: Cells combine to form tissues and tissues make up an 
organ. The relationship between a tissue and an organ 
is similar to the one between 


A. Fats and carbohydrates 
*B. Organ and the system 

C. Liver hand the gall bladder 
D. Ectoderm and endoderm 


Better: The relationship between a tissue and an organ is simi- 
lar to the one between 


AES ви“. Gom Do 
C. Responses 


(i) Provide for 3 to 5 alternatives 

Three to five alternatives depending upon the nature of the 
teaching/learning point and maturity level of the pupils may be 
used. Less choice for younger ones and more for the test 
Sophisticated may be useful. Well chosen three choices are. 
quite usable in classroom testing, if four choices are not easily 
available. 


(ii) Item should admit only one correct or best answer 
Poor: Fish can be preserved in salt water because 


274 Handbook of Pupil Evaluation 


А. salt acts as a poison for bacteria. 
*B. bacteria cannot withstand the osmotic action of 
salt. 
C. salt alters the chemical constituents of the food. 
D. salt protects the fish from contact with air. 


In this item responses А and B both could be judged as 
correct as. response-B, Simply explains the response-A which 
also is correct. 


Better: Fish can be preserved in salt water because 


A. bacteria can survive more easily in salt water. 
B. bacteria cannot withstand osmotic action of salt. 
C. salt alters the chemical constituents of the food. 
D. salt protects the fish from contact with air. (7) 


` (iii) Avoid clues of various types 


(a) Format clue (wording of responses) 

In the example (Poor) given above under (ii) the difference in 
use of the first word provides clue to the designated response 
i.e. ‘B’ because in all the three distractors the response begins 
with salt while in response “В” which happens to be the correct 
answer the first word is ‘bacteria’, quite different from others. 


(b) Associational clue 
Poor: Some ofthe observable differences between organisms 
of the same kind is called 


A. heredity B. adaptation 
C. variation D. heterophylly 


In this case “С” has a verbal association with the word 'differ- 
Ba as both variation and difference are synonymous and a 
ucent can guess the correct answer simply by this clue. 
Q Grammatical clue 
Or; i i 
Which of the following plants bears naked seeds? 


Oe o Mn 


Objective-Type Items (Multiple Choice Type) 275 
A. Grasses B. Lilies *C. Pines D. Ferns 


In this case the ‘key’ is in the singular and distractors are 
presented in the plural form. 


Better: A. Grasses B. Lilies C. Pines D. Ferns 


Poor: He ran fast on seeing the dog. In this sentence the word 
‘fast’ is used as 


A. a verb B. a noun 
C. a preposition *D. an adverb 


In this case the article ‘an’ in the key provides a clue for the 
correct answer. 


Better: A. a verb B. anoun C. an adjective D. an adverb 


(d) Verbal clue by repetition of a word from the stem 
Poor: The relationship between a cell and a tissue is almost 
similar to the one between 


A. Cell and the system 

*B. Tissue and the organ 
C. Liver and the gall bladder 
D. Ectoderm and the endoderm 


In this item the word ‘tissue’ provides a clue due to verbal 
association with the word ‘tissue’ in the stem. 


Better: Replace the response—B by ‘Organ’ and the ‘System’. 


(е) Response length clue 
Poor: A sced can germinate in the absence of 


A. Moisture 

B. Oxygen 

C. Temperature 
*D, Amount of light 


276 Handbook of Pupil Evaluation 


In this case the length of response ‘D’, which happens to 
be the key, gives a clue. 


Better: D. light. 


(f) Sentence completion clue 
Poor: When dilute sulphuric acts on Zinc 


A. Carbondioxide 

B. Sulphurdioxide 

C. Nitrogen 

D. Hydrogen is formed 


Here a clue is given in ‘D’ by completing the sentence. 


Better: Which gas is evolved when dilute sulphuric acid acts 
on Zinc? 


A. Carbondioxide 
B. Sulphurdioxide 
C. Nitrogen 
D. Hydrogen 


(e) Clue by determinders like ‘never’, ‘always’, ‘seldom’, 
generally 
Poor: Lack of vitamin-C in diet results in 


A. Nervous disorder 
B. Night-blindness 

C. Beribery 

*D. Bad gums generally 


The word ‘generally’ provides the clue to the answer.. 
Better: Lack of vitamin-C in diet results generally in. 


A. Nervous disorder 
B. Night-blindness 
C. Beribery 

*D. Bad gums 


ve 


Objective-Type Items (Multiple Choice Type) 277 


(iv) Use heterogeneous responses ог homogeneous responses 
depending upon the intended difficulty level 


Heterogeneous responses Homogeneous responses 
Cuscuta is an example of Cuscuta is an example of 
A. an angiosperm A. a hydrophyte 

B. a gymnosperm B. a saprophyte 

C. a fern C. an epiphyte 

*D. a parasite *D. a parasite 


(v) Use only plausible distractors 
Poor: Which of the following types of questions ensures the 
LEAST scoring uniformity? 


Poor Better 
A. Multiple choice A. Essay type restricted res- 
type ponse 
B. True-false type B. Short answer type 
C. Matching type C. Very short answer type 
*D. Essay type *D. Essay type-extended res- 
: ponse 


(vi) Responses should be as brief as possible 
Poor: | According to Bloom's taxonomy, the comprehension 
objective means 


A. the ability to recall, translate and interpret 

B. the ability to translate, analyse and interpret 

C. the ability to interpret, extrapolate and evaluate 
*D. the ability to translate, interpret and extrapolate 


Better: According to Bloom's taxonomy, the ability to trans- 


late, interpret and extrapolate is covered under the 
objective 


A. Knowledge 
*B. Comprehension 
C. Application 

D. Analysis 


278 


Handbook of Pupil Evaluation: 


(vii) Avoid use of stereotypes except to misguide the rote learner 


Poor: 


Good: 


In the life history of the frog the tadpole resembles a. 
fish. This phenomenon illustrates that 


*A. ontogeny repeats phyllogeny (poor use) 
B. fishes have evolved from amphibians 
C. toads and frogs have arisen simultaneously 
D. fishes and frogs have evolved from a common ancestor 


The greatest improvement in live stock has been 
brought by 


*A. selective breeding 

B. organic evolution 

C. environmental influence 

D. survival of the fittest (good use) 


(viii) Avoid lifting a statement from textbooks 


Poor: 


Better: 


(ix) Write the responses in se 


Poor: 


How adequately the content of the test samples the 
domain of the subject matter about which inference is. 
to be made, is termed as 


*A. content validity 

B. concurrent validity 
C. predictive validity 
D. construct validity 


A geography unit-test which is based on the content 
elements of that unit and instructional objectives indi- 
cated in its blue print reflects its 


*A. content validity 
B. concurrent validity 
C. predictive validity 
D. construct validity 


| quential order у one exists 
The population of India is about 


гар 


Objective- Type Items (Multiple Choice Type) 279 


A. 400 millions B. 550 millions 
*C. 650 millions D. 750 millions 


Better: If the radius of earth is increased by 5 feet its circum- 
ference at the equator would be increased by about 


A. 6feet. *В. 31 feet C. 75 feet D..300 feet (8) 


(x) Use ‘all of these’ or ‘none of these appropriately and 
sparingly 
When used make them correct answers as well as distrac- 
tors as frequently as other distractors. 
Poor: (i) Which of the following (ii) Mughal empire can be 


terms is closest in mean- regarded as 
ing to ‘growth’? 
A. Maturation A. controlled democracy 
B. Learning B. benevolent monarchy 
C. Development C. parliamentary demo- 
cracy 
D. None of these (Poor) D. all of these (Poor) 
Good: Which of the following Mughal empire сап [be 
words is mis-spelt? regarded as 
A. Receipt A. controlled democracy 
B. Belief B. parliamentary demo- 
cracy 
C. Perceive C. presidential form of 
government 
D. None of these (good) D. None of the above types 
Е (good) 


(xi) Ensure that all responses are independent of each other 
Poor: Muslim population in India is more than 


A. 70 millions B. 100 millions 
C. 150 millions D. 200 millions 


In this case options overlap and the correct answet could be 
A and B both as in both cases the population will be more, 
Better: Muslim population in India is about 


280 Handbook of Pupil Evaluation 


A. 70 millions B. 100 millions 
C. 150 millions D. 200 millions 


(xii) Avoid non-parallel responses 

Sometimes one of the four or five responses is not parallel 

to the other and becomes functionless as illustrated below: 
Which of the following organisms does not have the capa- 

City to manufacture its own food? 


Poor Better 

A. A fern A. A fern 
B. A moss B. A moss 
C. An alga C. An alga 


*D. An animal (not parallel) D. A fungus (parallel) 


(xiii) Use qualitative scale of responses when the answer varies 
from complete establishment to complete indefiniteness 


Desirable 
Quite a few cases of dengu have been attributed to virus. What 


was the status of this idea in 1960? 


A. This was well-established by medical sciences 

B. It was a controversial matter and the evidence was incon- 
clusive 

C. This was completely rejected on the basis of sufficient 


evidence 
*D. This was a recent development and needs further evidence 


and investigation 


(xiv) Combine two elements to give four responses when more 
coverage is desired or all the responses are not possible 


Desirable ] 7 
ТЕ an essay test of six questions is replaced by three essay-type 


questions and ten short-answer type questions, it is likely to 
improve 
*A. Validity as well as reliability 
B. Validity but not the reliability 


C. Reliability but not he validity 
D. Neither validity nor reliability 


Objective-Type Items (Multiple Choice Type) 281 


(xv) Avoid placing items in tandem (Not desirable) 

Sometimes the responses are placed in tandem i.e. one after the 
other to save space but this makes the student much more 
difficult to compare different responses in the process of its 
selection. 


Example 
The balance-sheet report of Gill's company would reveal 


A. company's profit for the previous fiscal year 
B. the amount of income tax paid 
C. the amount of sales for the period and 

*D. the amount of money owed to its creditors 


(xvi) Provide alternative definitions rather. than words as alterna- 
tives while testing understanding of definitions 


Not desirable 


From the lakes, rivers and occans, water gets into the air by a 
process called 


A. Filteration B. Condensation *C. Evaporation D. Distillation 


Desirable 
Evaporation is a process by which 


A. vapours turn into liquids 

B. liquids turn into solids 
*C. liquids turn into vapours 

D. solids dissolve into liquids 


CHAPTER XI 


OBJECTIVE-TYPE ITEM 
(OTHER VARIETIES) 


1. Constant Alternatives Type 
(True-False and allied varieties) 


1.1. Format 


The students’ are required to select the answer from two or 
more choices or alternatives which remain the same for a whole 
series of items. These items are usually in the form of state- 
ments concerning whose truth or falsity the pupil is to make a 
judgement. In certain varieties, instead of a statement, а direct 
question may be asked for pupils to respond ina Yes-No form 
as given below: 


Is lotus a water plant? Yes-No: 
Lotus is an aquatic plant. True-False: 


Responses of this type may take the form of True-False, Yes- 
No, Agree-Disagree, Synonym-Antonym etc. The statement is 
preceded by a Clear direction regarding the stipulation of 
responses and mode of responding. E 

In continuous with the previous chapter on objective-type 
questions this chapter deals with other varieties of questions. 
like those of constant alternatives, matching type, rearrangement- 
type, pictorial and structured questions. It is intended that the 
reader after going through this chapter will be able to: 


Objective-Type Items (Other Varieties) 283 


(a) recognise the format, design and varieties of three 
questions, 

(b) identify examples and non-examples of different varieties 
of such questions, 


(c) apply the basic principles of construction of these question 
in framing questions of these types, 

(d) describe the nature, scope, and limitations of these 
questions, 


(e) select and use the relevant variety of items for testing the 
intended abilities, and, 


(f) suggest improvement in a given set of defective items. 


The following five major varieties other than multiple choice 
type are described here-in-after. 


1.2. Forms of Constant Alternatives 
(a) True-False variety 
(i) Temperature remaining the same the pressure 
in a gas varies directly proportional to its 
volume. T—F - 
(ii) Star fish belongs to the class, Pisces. |OT—F 


(b) Right- Wrong variety 


(i) Pinto ran fast in the 100 meter race. 


R—W 
(ii) Ahmad as a fast runner of 100 meter race. R—Ww 
(c) Agree-Disagree variety 
(i) Religious dogmas lead to нки of 
religion. A—D 
(i) Every religion teaches hatred for other 
religions. A— D 


(d) Yes-No variety 


(i) Does a water pump work on the principle of 


284 Handbook of Pupil Evaluation 


water pressure? Yes—No 
(ii) Is Cinchona a medicinal plant? Yes—No 


(e) Synonym-Antonym variety 
Underline ‘S’ if the sentence reflects the use of two words 


similar in meaning and ‘A’ if used for opposite meaning. 


(i) Delhi weather during May is very uncomfort- 


able and uneasy. S—A 
(ii) Hard water is not fit for drinking but soft 
water is, Red 


(f) Correction variety 
Underline the false part of the statement and write the 
correction in the bracket provided against each statement. 


(i) Water consists of two parts of hydrogen 
and two parts of oxygen by volume. (one part) 
(ii) Water is'a chemical element. (Compound) 


(g) True-False—Doubtful variety (T.F.O.) 
Underline T if true, F if false and O if the statement 
reflects an opinion. 


(i) Mammals suckle their young. T—F—O 
(ii) Earth revolves round the moon. T—F-O 
(iii) There are no plants on Mars. T—F—O 


(h) Cluster variety б 
In this case a complete stem followed by a number o 


statements are given for students to mark true or false. 


i gas tends to 

i) The volume of a mass of ра: 

w ( ) increase with increase in temperature. T—F 
а, трета 


(b) increase with increase in pressure. ЈЕ 


(с) decrease with increase in pressure. TE 
(d) decrease with increase in temperature. 
D T-F—TF variety 1 | 2 : 
Ve tatement underline T if the statement is 


Against each 5 


O bjective-Type Items (OtherVarieties) 285 
true under all circumstances and F if it is false under all 
circumstances and TF if it is true under certain circum- 


Stances and false under other circumstances. 


(i) Sum of the angles of a triangle is equal to 


two right angles. T—F—TF 
(ii) Respiration takes place in green cells only. T—F—TF 
(iii) Living cells can manufacture food. T—F—TF 


(1) Converse True-False variety (T.F. СЛ. C.F.) 
Underline T if the statement is true, F if it is false, also 
mark C.T. if converse is true and C.F. if converse is false. 


(i) A square is a quadrilateral. Т—Е— CT—CF 
(ii) Every parallelogram is a rhombus. T—F—CT—CF 
(iii) An equilateral triangle is also equi- 

angular. T—F—CT—CF 
(iv) АП parasites are animals. T—F—CT—CF 


(k) Cause-effect relationship variety 
Certain words are italicised in the following statements. 
Underline the word *cause' if they reflect cause and under- 
line ‘effect’ if they are effects of certain causes. 


(i) Tides are formed during full moon. ' Cause — Effect 
(ii) Due to absence of Chlorophyll a fungus 
cannot manufacture its food. Cause — Effect 
(iii). Leaves of certain plants become yello- 
wish in the absence of light. Cause— Effect 


(iv) Aurangzeb's religious policy was res- 
ponsible for the downfall of Mughal 
Empire, Cause — Effect 


(1) Qualified True-False variety 
Students are required to judge whether the statement is true 
or false and whether the item can be made true by qualify- 
ing the statements by adding some more information, In 
the following statements adding qualifications to make the 
item completely true if needed, Underline R or W. 


. 286 Handbook of Pupil Evaluation 


(i) All leaves can manufacture Add: 'green' before 


food. ‘leaves’ (R—W) 
(ii) Under the Indian constitu- Add: on the advice of 
tion the President can dis- the Prime Minister 
miss a Cabinet Minister. (R—W) 
(iii) A speaker of a legislative No qualification , 
assembly does not belong required (R—W) 


to any political party. 


(m) Diagram-based variety 
In certain cases diagrams can be given and students may 
be required to make small judgements after observing the 
diagrams. 


A B 


Observe the two diagrams of Door—A and Door—B with 
two iron bars joined in the frame. 


(i) Door—A will become stronger than Door B T—F 
(ii) Door—A will require more bar length than 
Door—B T—F 


1.3. Guidelines for Construction 


Following examples would illustrate different suggestions which 
тау be considered useful for constructing true-false items: 


(a) Item should be based on some significant and useful pro- 
positions: 


Poor : Relationship between a parasite and the host plant 
is important. 
Better: A parasite gets its food from the host plant. 


Objective- Type Items (Other Varieties) 287 


(b) Statement should be made absolutely true or absolutely false 
by seeking experts' unanimity. 


Poor : All acids are liquids. 
Better: All acids turn blue litmus red. 


(c) Necessary background or qualification must be incorporat- 
ed in the statement to pinpoint the intended response. 


Poor : Objective-type questions are better than essay-type 
questions. 

Better: From the point of view of objectivity in scoring, 
the objective-type questions are better than essay- 
type questions. 


(d) Avoid half-true and half-false statements. 


Poor : The sun is a star while the Earth is a satellite. 
Better: The sun is a star while the Earth is a planet. 


(e) Avoid specific determiners like ‘always’, ‘frequent’, ‘never’ 
etc. as they provide clues. 


Poor : Scoring objectivity in essay type questions has 
always been questionned. 


Better: Grading of essay-type questions is questioned for 
subjectivity in scoring. 


(f) Express single idea in each statement. 


Poor : Solar and Lunar eclipses are formed when the 
Moon comes betwe:n the Sun and the Earth or the 
Earth comes between the Sun and the Moon. 

Better: A Lunar eclipse is formed when the Earth comes 
between the Sun and the Moon. 


(g) Use unambiguous and precise language t 
о 
ingclear. —— 5 make {һе mean. 


Poor : It is possible to determine if a solution is acidic p 
y 


288 Handbook of Pupil Evaluation 


the red colour formed on the litmus when it is 
inserted into solution. 
Better: An acidic solution turns blue litmus red. 


(h) Textbook wording and phrases may be avoided to dis- 
courage the rote learner. 


Poor : For.every action there is equal and opposite 
reaction. 

Better: A ball when struck against a wall rebounds due to 
Newton's third law of motion. 


(i) Avoid the use of negative statements and especially double 
negatives to make a true statement false or a false state- 


ment true. 


Poor : The Earth is not a planet. 
Better: The Earth is a planet. 
Poor : None of the articles in the Constitution is unneces- 


sary. 
Better: All the articles in the Constitution are necessary. 


(j) Avoid items expressing opinions (unless you use T.F.O. 
variety). 


Poor : The First Indian will go into the outer space next 


December. (opinion) 
Poor : Planets and animals can live on Mars. (opinion) 


(К) Avoid giving two ideas in one statement unless it depicts a 
cause-effect relationship. 


Green leaves can manufacture food but colourless 


leaves cannot. 
Better: Green leaves can manufacture food because they 
contain chlorophyll. 


Poor : 


(1) Length of the true-false statements should approximately 
be the same. 


и "о 


Objective- Type Items (Other Varieties) 289 


(m) Number of true-false statements should approximately be 
the same. 
However, some studies indicate that the number of false 
statements should be more (about 60?) than true state- 
ments because of their better discriminating power which 
may be due to the *Acquiscence response set' ie. a. student 
is more likely to accept a statement than reject it when he 
is not sure of the knowledge involved in it. 


1.4. Steps in Constructing Objective-basea True-False Items 


Superficially, it looks quite easy to frame True-False items. In 
fact, for developing a good item of this type some pre-requisites 
are necessary. р 

These are: 


(a) Selection of a good proposition/concept/principle 
(b) Restatement of the essence of proposition/concept/idea 
(c) Finding implications of the proposition/concept 
(d) Couching the antithesis of proposition in words and 
phrases - 
(e) Writing the item using the same essential point for one true 
and one false version. 
4 
Following example would clarify the underlying idea for 
framing such questions. 


Concept: Structure of leaf is related to its functions. 


True False 
(a) Restate the essential idea in different words. (Recalls) 
(i) Plants floating in water (i) Plants floating in water 
have abundant stomata are characterised by the 
in their leaves, absence of stomata in 


: their leaves. 
(b) Restate a part of original idea. (Recalls) 


(ii) Water plants with float- (ii) There are more stomata 
ing leaves have more on the lower surface 


290 Handbook of Pupil Evaluation 


stomata on the upper than on the upper sur- 
surface than on the face of the floating 
lower surface. leaves of water plants. 


(c) Relate the basic idea to some other ideas (Sees relationship) 
(iii) Abundant stomata on (iii) Abundant stomata on 


upper surface of floating the upper surface of 

leaves are meant to floating leaves are 

enhance transpiration. meant to reduce trans- 
piration. 


(d) Develop implications of basic ideas (Sces relationship) 
(iv) Rate of transpiration in (iv) Rate of transpiration in 


floating leaves is directly floating leaves is inver- 
proportional to. the sely proportional to the 
number of stomata on number of stomata on 
the upper side. the upper side. 


(e) Infer the effect of different circumstances (Infers) 
(v) If all the stomata on (у) If all the stomata on 


the upper surface of the the lower surface of the 
floating leaves of а floating leaves of а 
water plant are smeared water plant are smeared 
with vaseline, the leaves with vaseline, the leaves 
would rot. would rot. 


1.5. Usability of True-False Items 


The first criticism that is levelled against true-false items is 
their suspectibility to chance error resulting from guessing. А stu- 
dent with a blind guess would expect to get a score of 50%. Of 
course the assumption of blind guessing is not realistic because 
all the students are notlikely to guess blindly all the items in а 
test. Secondly; such items are generally judged as testing for tri- 
vials. Thirdly; such items if culled out from the textbook 
encourage sheer verbal memory. Fourthly; many statements do 
not provide for enough background information or qualifications 
to judge an item as completely true or completely false. Fifthly; 
these items do not provide for explicit alternatives thereby 
making it difficult to judge the relative truth or falsity of the 
item. Sixthly; true-false items are considered psychologically 
harmful because they have a negative suggestion effect. Finally, 


Objective-Type Items (Other Varieties) 7 291. 


these items favour the aggressive individuals willing to take 
chance. 

Inspite of the various shortcomings listed above, these items 
do have a number of advantages especially for classroom 
testing. Firstly; they provide for a simple, direct and funda- 
mental test of students’ knowledge. Secondly; true-false items 
are quite efficient because the number of scoreable responses 
per hour of testing time tends to be much higher than that of 
the multiple choice'items. Thirdly; most teachers find the task 
of writing true-false items to be simpler and less time consum- 
ing as compared to equally good multiple choice items. 
Fourthly; they are less time consuming in construction due to 
less technical competence required. Fifthly; such items are good 
for testing cause-effect relationship and for discriminating 
between facts and opinions etc. (2) 


1.6. Improving Discrimination of True-False Items 


"The most serious defect with true-false items is their amen- 
ability to the correct answer by chance without adequate under- 
‘standing of the concept involved. When such a test is taken by 
students, the difference between those who attempt it correctly 
and those who answer it incorrectly is not as great as it should 
be. The reason is that those who attempt it correctly include 
students: A 


(a) who understand thoroughly the concept involved and 
deserve full credit for their answer; 

(b) who do not have full grasp of the concept but are more 
or less lucky to choose the correct answer and therefore, 
do not deserve full credit as those in the first category. 


Among those who answered incorrectly are those 


{а) who were misinformed and therefore whose incorrect 
answer was expected; 


(b) who had partial information relevant to the concept in- 


volved and their incorrect answer was more or less due io 
their bad luck. 


292 Handbook of Pupil Evaluation 


To improve the discrimination of these items Ebel (1966) 
suggested five choices in place of two (T.F.) depending upon 
the extent to which an examinee may consider it true or false 
in accordance with the confidence he has about the correctness 
of his response. He may be given a single credit point with no 
penalty for items he chooses to answer for one mark. For 
double credit items he may be awarded 2 marks for every 
correct answer with a risk of double penalty for every incorrect 
answer to all such questions he chooses to answer with double 
credit. To discourage blind guessing he is allowed to omit an 
item without losing the score he could expect by blind guessing. 
Thus the credit or penalty he gets depends on his confidence 
about the correct answer. Thus responses are weighted as 
follows: (3) 


Response Confidence level of response Score value 
Number 


а Right Wrong Omit 
The statement is probably true 2 —2 
The statement is possibly true 0 
I have no idea = 
The statement is possibly false 
The statement is probably false 


0.5 


SE рэ Кора 


мә | ж 


The underlying assumption is that а student should not ask 
for double credit on an item unless he is confident enough that 
his answer would be correct in at least 2 out of 3 cases. So long 
asthe proportion is above two thirds the students’ score will 
remain higher if he chooses higher confidence level (double 
credit with double penalty) as illustrated below: 


Example 

Low (Possible) Score High (Probable) Score 
with 80% 80 x 1=80 (80 x 2)—(20 x 2)=120 
with 70% 70x 1=70 (70 x2) - (30x 2) = 80 
with 67% 67х1=67 (67х2)— (33 х2)= 68 
with 66% 66х1=66 (662) G4 x2) ба 
with 60% 60x 1—60 (60 x 2) - (40x 2)— 


with 50% 50х1=50 (50 х2) — (50 х 2)= 00 


т 


Objective-Type Items (Other Varieties) 293 


Thus a student should not opt for double credit on an item 


unless the odds are 2 to 1 or better. 


1.7 


. Score and Confidence Level 


Personality trait (confidence) has therefore some relationship 
with the score accrued on such a procedure. Bold, cautious and 
timid examinees may get different scores due to the difference 
in their personality trait (confidence). Who among these has 
advantage over the other is illustrated below: 


(а) In a 40 item true-false test suppose a student knows 


(b 


enough about the quality of answers he can give, about 20 
items, and his probability is p=0.75 (3 out of 4) but for 
the remaining 20 items, he is not that sure and his proba- 
bility of attempting these items correctly is p—0.60 (3 out 
of 5). However, he is quite sure that he cannot answer 
these 20 items beyond p=0.60 and therefore, should not 
attempt at a probability more than p=0.60. So he attempts 
20 items at p=0.75 and the other 20 at p=0.60. 

) Student—A is overconfident and attempts all the 40 items 
with high confidence (p=0.75). 


(©) Student—B is timidly cautious and attempts all the 40 


items with low confidence (p —0.60). 


(d).Student—C is neither over-different nor over-cautious but 


stu 


unaware of what he knows well (High confidence) and 
what he knows poorly (Low confidence) and therefore 
attempts 20 items of p=0.75 with high confidence and 20 
items of р=0.60 with low confidence using the weighted 
Scores ie. double credit (2 marks) with double penalty 
(—2) and single credit (one mark) with no penalty for 
items attempted at a high and low confidence level respec- 
tively. We may calculate the marks of the four types of 
students (in respect of personality trait i.e. confidence) to 
judge who among them would gain most. 


From the scores of these three students it is clear that а 
dent is likely to score more if he has 


(a) knowledge of the subject, and 


294 Handbook of Pupil Evaluation | 


STUDENT-S | 


@ Aware of what he knows well and what he knows less 


Attempts with ee: 
high confid- Attempts J 


ence level “йш low confidence $ 
20 (p = 0.75) 20 (p 0.50) 
i$ 5 12 8 Р 
Likely right Likely wrong Likely right Likely wrong { 
V Y 
Score: (15x2) — (5х2)= 20 (12x1) (No deduction) = 12 


Total score = 20-912 (32) 


STUDENT =A 
(ii) Over-confident 
Attempts at high Attempts at High 
confidence kvel 40 items c level 
20(р=0.75) 20(р = 0.60) 


15 5 12 B 
Likely right Likely wrong Likely right Likely wrong 


4 Y Y 


Score: (152) — (5х2)=20 ^a2x2 — (8x2-8 Y 
. Total score = 20-8 «= (28) n 
STUDENT =B 
(iii) Timidly cautious ` 
Attempis at. low Attempts at low 
confidence level T confidence level 
40 items А 
{ 20 (p=0.75) 20 (р“0.60) 
Ta: P d NC / N 
f 15 5 12 8 
Likely right Likely wrong Likely right — Likely wrong 
| 
з Y 
Score: dsxi)  (Nodeductiony =15 (12x1) _ (Modcduction) = 12 | 
‘Total score = 15 + 12 = (27) | 
STUDENT -C | 


(v) Unaware of what he knows well and what he knows less 
Attempts 10 items | 


pen ire 40 items athighandlO | 
nfidencc. PT itemsat low E 
а 20 (p=0.75) 20 (p=0.60) | confidence level | 
10 10 10 10 
ZN oS ад м 
7 3 7 3 6 4 6 4 | 


Likely Likely Likely Likely Likely Likely Likely Likely 
right wrong right wrong right oe nm wrong 
1 | 1 1 

(7x2)- (9X2) + (7х1) (7) = 15 (6Х2)—(4Х2) +(6%1)(-) = 10 


Score: 
Total Score = 1510 (25) 


Objective-Type Items (Other Varieties) 295 


(b) knowledge about the quality of his answers (confidence 
level). 


His score will suffer when he is deficient in either. Distri- 
bution of students’ score indicates whether a student is over- 
cautious, over-confident or unaware. Swineford (1941) derived 
a score to indicate the tendency of students to gamble. Thus 
score was derived by dividing the number of errors on items on 
which the examinee asked for double credit, at the risk of double 
penalty, by the total number of errors. Inter-correlations of these 
gamble scores on four tests ranged from 0.2 to 0.8, thereby 
revealing the tendency of students to gamble. (4) 2 


1.8. Ratio of True-False Statements 


Some investigators have found that false statements are more 
discriminating that true statements which may be due to the 
fact that when students are not sure of the correct response they 
tend to accept, rather than doubt, the truth or falsity of the 
statement. If this is correct, advantage may be taken to include 
more false statements (about 60%) than true statements. If 
more than that, say about 75% false students are included, then 
students are likely to recognize this imbalance while attempting 
and may take advantage of marking a statement false when in 
doubt. With weighted scoring, the reliability. of true-false tests 
appears to increase without increasing the number of items as 
in conventional tests. Chronbach (5) and Thorndike have shown 
that reliability increased from 0.574, .765, .728 to .713, 0.828, 
.821 through this method. This reliability coefficient of 0.80 or 
higher for classroom tests of educational achievement is quite 
respectable. This method may be tried out to see how far it 
discourages the students’ tendency to gamble through guessing. 
Constant alternatives more cornmonly known as true-false 
items, are no doubt notorious for their chance scores due to 
5075 guessing possibility but none-the-less, these items are one 
of the most usable forms which, if constructed with care, to 
test objectives other than mere reproduction of facts can be 
used more profitably by classroom teachers. The weighted 
marks system discussed here can во a long way in improving 


296 Handbook of Pupil Evaluation 


the discriminating power of true-false items. The utility of such 
items lies more in ensuring high-quality items during construc- 
tion than in taking care of the guessing aspect by using correc- 
tion formulae. However, increasing the number of choices from 
two (True-False) to four or five would definitely reduce the 
guessing possibility as in the case of multiple-choice items 
which are discussed in the next chapter. 


2. Matching-Type Items 
2.1. Design 


Matching type of items are usually presented as a set of defini- 

' tions, terms, events, names, phrases, etc., called the stimuli which 
are placed generaily on the left hand side column, say 1, and 
another set of names, terms, pictures, statements called the 
responses placed on the right hand side column, say II. Res- 
ponses are to be matched with the stimuli. The items in column 
I are called premises and those in column II are called 
responses. Thus there are two lists and students are required to 
match each item with the corresponding response from the other 
list. Since each item from one list isto be matched with the 
corresponding response, it is considered as a one-test item 
which is counted as one score point. However, there are а 
number of items of similar nature which are to be put together 
so that the student is forced to make pairs out of the given 
responses. 

The same kind of relationship should hold for each of the 
premises and the responses in the exercise. These relationships 
may be between a term and its definition, objects and its func- 

tion, event and its date, author and its work, inventor and 
invention, cause and effect, problem and solution, examples and 
principles, etc. A matching-type item Is in fact a wea: 
item in which the items (premises) are different but the set o 

responses remains the same, out of which One response 15 to be 
selected unlike ordinary in multiple-choice items іп which every 
item is followed by a separate set of 4 or 5 responses of which 
one is correct (key) in contrast to the matching-type in which 
every response acts as key for one or the other item in the list. 
Therefore, basically it is a modification of the multiple-choice 


Objective- Type Items (Other Varieties) 297 


item. If the number of responses equals the number of items 
(premises) than it is termed as perfect matching and when the 
number of responses are more than the premises then it is 
called imperfect matching. In either case, the matching is to be 
done by the examinee on some basis or criterion which may be 
implicit or explicit. If there is a single criterion for matching 
the items with the responsss then it is called simple matching 
but if more than one criterion is made the basis for matching 
vis-a-vis more than two lists given, then it is called Compound 
matching. The following illustrations would clarify the concept 
of different varieties of matching-type items. 


2.2. Varieties of Matching-type Items 


(i) Column matching (Imperfect Homogeneous variety) 

For each term in column I select the statement from column II 
which best defines it and write its serial number in the bracket 
provided on the left against each item. 


Column I Column II 

(E) 1. Curricular validity A. Estimating correlation between 
parallel forms of the same test. 

(D) 2. Construct validity B. Estimating correlation between 
two measures obtained at the 
same time. 

(B) 3. Concurrent validity C. Estimating correlation between 
some measure and criterion 
measure obtained at a later 
date. 

(C) 4. Predictive validity D. Estimating how wel! the 
intended abilities hypothesised 
are actually measured. 

E. Estimating how well the con- 
tent and instructional objec- 
tives are sampled апа 
measured. 

F. Estimating relationship bet- 
weer two measures of the same 
person. 


298 Handbook of Pupil Evaluation 


(ii) Column matching (Perfect Heterogeneous variety) 

Match the discoveries in science given in column I with the 
names of disco given in column II and write the corresponding 
number in the bracket provided on the left against each of the 
items. 


I Il 
(B) 1. Demonstrated circulation of blood A. Pasteur 
(C) 2. Demonstrated statistical approach to B. Harvey 
human heredity 
(D) 3. Demonstrated crucial experiments on G. Galton 
the mechanism of heredity 
(A) 4. Demonstrated experiment in support D. Mendel 


of spontaneous generation 


(iii) Compound matching (more than one basis—3 lists) 

Three lists are given in three different columns. Select the 
matching animal and the class corresponding to each charac- 
teristic given in the column I and write their serial number in 
the same order against each item in the brackets provided to the 
left. (6) 


I Hi IH 
Characteristics Animals Class 
(F) (a) 1. Body covered with А. Frog a. Reptilia 

scales B. Pigeon b. Mammilia 

(C) (e) 2. Respires by gill C. Fish c. Aves 
(A) (d) 3. Lives both on land D. Butterfly d. Amphibia 

` and in water E. Bat e. Pisces 
(E) (b) 4. Suckles its young F. Snake f. Arthropods 


ones 


(iv) Classification variety (Re-usable responses) , 
Identify whether the sentence is simple, compound or complex. 
and write against each item in the bracket provided, on the left. 


A. if the sentence is simple 
B. if the sentence is compound 
C. if the sentence is complex 


Objective-Type Items (Other Varieties) 299 


(A) 1. Sidhu went to his school on bicycle. 

(C) 2. If Anjum were in Delhi she could have visited the Asiad 
with her friends. á 

(B) 3. In Delhi summer days are long and nights are short. 

(A) 4. Мамдеер is an obedient boy. 

(B) 5. During the 9th Asiad new stadia were errected and new 
flyovers built. 


(v) Aiternative response matching 

In this case each response may be relevant or irrelevant to the 
corresponding item and the examinee is required to indicate by 
letter or serial number whether or not the response corresponds 
to tke specified item. 


Example 

In the blank before the number of each datum write the letter 
R if the hypothesis is relevant to the datum and write I if the 
hypothesis does not support the datum or is irrelevant. 


Datum Hypothesis 

I. 1. With above-average students the A. Creativity is 
correlation between scores on related to intelli- 
intelligence test and creativity test gence. 
is 0.00 

R. 2. Scores of 30 year-olds tend to be В. Some mental 
slightly higher than the 15 year- abilities tend to 
olds increase after the 

age of 20. 

К. 3. Т.О. scores of identical twins C. Intelligence is an 

raised apart tend to correlate inherited trait. 


higher than the 1.0. scores of 
fraternal twins raised in the 
same family. 


(vi) Check-list variety 
It requires the student to go through a list of phrases, state- 
ments or words and tick mark the ones that correspond (match) 


to the criterion or the base which is indicated in the direction 
of the items. . 


300 ; Handbook of Pupil Evaluation 


Example 3 

Examine the list given below and tick-mark the words or 
phrases which relate to estimation of the reliability of a teacher- 
made test. 


1. Stability ....... га “1. Equivalence ......... 
2. Concurrent ......... 8. Split half 
3. Predictive — ......... 9. Stability and 
equivalence 

GA£WKYRI—20. £9. 10. Judgemental 
5. K.R.—21 11. Construct 
6. Facility 12. Discrimination 

index a аеа index ^ — 7 


(vii) Master-list (Single Key) 

It requires the students to choose from a constant group of 
alternatives, usually 3 to 5, the one that applies best to a group 
of items. The key response may be reused or one of the 
responses may not act as a key for any of the items in the 
group. Thus a master-list resembles the common two-column 
matching items in having the same set of responses reusable 
for different items (Premises). It differs from it in that the 
choice of options remains the same (constant) for each item 
unlike the traditional matching-type items in which after 
attempting each item the number of options (responses) does 
not remain constant but goes оп becoming less and less after 
each item. So choices can be used more than once for 
different items. Moreover, the number of items is much more 
than the number of keys (master-list). These items differ from 
multiple choice items in that the same set of choices (key list) 
are usable, unlike in the multiple choice items in which alter- 


natives change from item to item. 


Example 
From the list (master key) at the top, select the kind of plants 


which show marphological features corresponding to each of 
the items that follow: 


‘A. Desert plants B. Alpine plants C. Water plants 


Objective-Type Items (Other Varieties} 301 
D. Marshy plants 


(B) 1. A well-developed root-system with reduced leay 


es on 
stem. | 
(A) 2. А deep root-system, succulent stems with reduced 
leaves. 


(D) 3. Presence of respiratory roots and capacity to with- 
stand salt water. 


(B) 4. Roots spread near the surface with stems having 
needle-like leaves. 

(6) 5. Poorly-developed roots with abundance of air cavities. 

(А) 6. Flattened green stems with leaves modified into spines. 


(C) 7. Dissected roots with broad leaves having stomata on 
upper surface. 


(viii) Master-list (Combined Key) 


Example 


Some words are italicised in each of the Sentences given 
below. Put in bracket. 


A. if the word is used as a noun. 
B. if the word is used as an adjective. 
C. if the word is used both as a noun 


and as an adjective. 
D. if the word is used neither as a no 


un nor as an adjective. 
Sentences 

(D) 1. Slow and steady wins the race. 

(C) 2. The Race course in Delhi is being utilised for horse 


races. 
(4) 3. Velodrome is а stadium for cycle races. 


(B) 4. Thirty three nations took partin the ninth Asiad held 
at Delhi. 


(C) 5. Women hockey 


Players played hockey well against 
Japan. 
(4) 6. 


Indian women gymnasts gave а good ре formance, 
2.3. Usability 


Compactness of Matching items makes it a difficult form of 


302 Handbook of Pupil Evaluation 


questions in terms of space and testing time. These are good 
for making a rapid survey of certain subject matters like those 
Of leading personalities, group of events, definitions of basic 
terms etc. Such items are good for testing sludents' ability to 
identify objects presented graphically or pictorially, translate 
terms and definitions from one form and establish relationship 
between two sets of objects, concepts, events, etc. A major 
criticism of these items is that they can be used to measure only 
lower level learning outcomes like recall of dates, events, etc. 
But this is not true, as the above examples should convince us 
that the relationship between problems and solution, data and 
hypothesis, characteristics and classification, cause and effect, 
etc. can be tested provided that the item writer is well trained 
to develop objective based items. 

Nevertheless, if sufficient care is not taken, matching items 
may encourage a serial-memory action rather than association 
and relationships. It is also difficult sometimes to find a 
homogenous cluster of items which in turn need homogeneous 
responses. Some common errors that are found in such items 
are: (a) vague direction, (b) vaguely stated premises, (c) usually 
long statements or responses, (d) lack of homogeniety in 
responses, and (e) difficulty of students regarding mode of 
responding or keying the items. If the following points are kept 
in mind while constructing the items, the quality of such items 
can be greatly improved. 


2.4. Steps in Construction 


The following steps are possible in constructing matching 
exercises: 


(i) Decide what is the intended aspect of measurement. 
(ii) Make a list of the premises to be used. 
(iii) Identify the responses including the distractors. 
(iv) Arrange the items ш some order. 
(v) Write the direction for pupils. | 
(vi) Indicate the space and mode of responding. 


2.5. Suggestions for Constructing Matching Items 


F , А ; ct- 
The following suggestions may be kept in mind while constru 


Objective-Type Items (Other Varieties) 303 


ing items: 


(i) Be cognizant of the objective and content elements which 
form the table of specification of the proposed test. 
(ii) Indicate clearly the basis of matching in the direction of 
the items or in the headings of the columns. 
(iii) Have one of the lists consisting of short phrases, single 
words, or numbers for quick examination and reducing 
the reading load. 


(iv) Use only homogenous and closely related materials in any 
one matching exercise. 


(v) Use heterogenous stimuli and responses if the item is to 
be set at an easy level. 

(vi) Make the number of alternatives larger (about 11 times) 
than the number of items to reduce the chance of guessing 
for the items attempted towards the end. 

(vii) Keep each list short, ranging from 5 to 12 premises and 


responses, the optimum size being 5-8 items per 
exercise. 


(viii) Avoid clues of any kind. 

(ix) Avoid grammatical inconsistencies like ‘all 
women’, all plural, 
clues. 

(x) Every premise must have o 
the other column. 

(xi) Label each set of premises and res 
а separate direction is not 
understand his task well. 

(xii) Give clear instructions if 
used more than once. 

(xiii) Avoid framing 
lest they shoul 
based, 

(xiv) Arrange response chronol 

(кў ке кул Pupils finding the response. 
confusion in sid Sis њи“ the same page озаты 
pom. atching items with Tesponses given on two 


men’, ‘all 
all singular, etc. to minimise relevant 


пе clear plausible answer in 


ponses, especially when 
given to help the examinee to 


some of the responses can be 


long sets of items for matching exercises 
d overemphasise an objective on which it is 


ogically, alphabetically ог 


304 Handbook of Pupil Evaluation 
3, Rearrangement Type 


3.1. Design 


Such items require the pupils to rearrange the randomly 
presented material into some specified order. The material may 
be presented in the form of a series of statements one after the 
other or in the form of responses as given in the multiple choice 
type, i.e. A, B, C, D etc. A direction is provided as to whether 
the responses are to be rearranged by writing them in the 
specified order, or to serial them into а particular order or to 
indicate the serial number of each response etc, etc. 


3.2. Types 
(i) Chronological order 


Example 
Rewrite the serial number of the events given according to 


chronological order in the space provided. 


1. V.V. Giri 3. Sanjiva Reddy 
2. Zakir Hussain 4. Radhakrishnan 
Rewritten as: g 1 4 2 


(ii) Functional order 


Example 
Rearrange the following steps involved. in the manufacture of 


food by plants, in order of their occurrence and write the serial 
number in the bracket provided against each. 


1. Formation of PGAL (5) 
2. Formation of A.T.P. (2) 
3, Excitation of chlorophyll (1) 
4. Splitting of water (3) 
5. Evoluiion of oxygen (4) 
6. Formation of starch (6) 


Objective-Tyre Items (Other Varieties) 305 


(iii) Logical order 

In developing a blue print of a unit test, the following steps are 
adopted for preparing a unit test. Rearrange the serial 
number ofeach step in the blank provided against each to 
indicate the correct order. 


S. No. Steps Correct order 
1. Writing the questions 4 
2. Deciding about the total marks for the test 1 
3. Identify major concepts to be tested 2 
4. Editing the items 6 
5. Reviewing the test 7 
6. Formulate unit objectives relating to concepts 

identified 3 
7. Developing the model answers 5 


(iv) Developmental order 
In certain cases, 
developmental o 
any other form. 
Statements, steps, 


the content matter is more amenable to a 
rder which may be natural, evolutionary or of 
Students are required to reorder the words, 
etc., according to the specified criteria. 


Example 


The evolutionary development 
is given in order from 1 to 5 
Correct order of development. 


al order of the following animals 
- Rewrite the Sequence showing 


\ 
1. Bat 2. Amoeba 3. Star fish. 4, Pigeon 5. Shark fish 


(Incorrect) 
1. Amoeba 2, Star fish 3. Shark fish 4. Pigeon 5. 


Bat 
(Correct) 
p Hierarchical order 
n terms of increasin 
da ШС. £ complexity the facts 


» concepts, principles, 
ical order. Students 
e: randomly-placed 


306 : Handbook of Pupil Evaluation 


Example 

Rearrange the given order of educational objectives of (i) Bloom, 
(ii) Krathwhol, and (iii Simpson in terms of hierarchical 
learning, by indicating the serial number against each objective 
listed. 


@) (ii) (iii) 
1. Knowledge (1) 1. Organisation (4) 1. Set (2) 
2. Application (3) 2. Characterisa- (5) 2. Perception (1) 
tion 
3. Analysis (4) 3. Valuing (3) 3. Cemplex overt 
response (5) 


4. Evaluation (6) 4. Receiving () 4. Originating (6) 
5. Comprehen- 5. Responding (2) 5. Mechanism (4) 
. sion (2) 
6. Synthesis (5) 6. Guided 
response (3) 


(vi) Taxonomic order 

Inthis case the responses are arranged in order of certain 
categories or classes based on a natural relationship as we 
know in case of classification of plants and animals. 


Example 


Examine the following order of classification of certain animals 
and rewrite the correct order in terms of natural relationships 
@) or indicate the correct order by rearranging the serial 
numbers (ii). 


(i) Qumbled) _ (Rearranged) (ii) (Jumbled) (Rearranged) 


1. Phylum Kingdom 1. Animalia Animalia 1 
2. Kingdom Phylum 2. Primates Primates 3 
3. Order Class 3. Hominidae Hominidae 4 
4. Class Order 4. Mammalia Mammalia 2 
5. Species Genus 5. Sapiens Sapiens 6 
6. Genus Species 6. Homo Homo 5 


————————— C———— rt“ tet 


Objective-Type Items (Other Varieties) 307 
4. Pictorial Varieties 


4.1. Design 


Students are presented with pictorial or graphic stimuli on 
which items are based. А picture, chart, table, sketch, diagram 
or a graph can be given as a stimulus followed by, responses in 
case of selection items. In case of free response questions, the 
questions can be put direct on the given pictorial stimulus. 
Thus objective-type, short-answer type and even essay-type 
questions cán be set on the chosen picture or diagram which 
acts as a base or medium for testing Various abilities. Likewise 
a picture of a diagram can be used as a problem by itself in 
which the solution lies in the drawing of a diagram, in its 
completion or modification depending upon the purpose of the 
exercise. Thus the two types of diagrammatic questions or items 
can be classified as under: 


4.2. Classification 


Diagram 
| 
у у 
Diagram as an objective/ Diagram as a medium/ 
problem i means/figural content 
| | 
NE i у 
Diagram to be Incomplete dia- Diagram is not to be 
drawn | gram to be drawn but used as a 
| completed or medium for testing 
| modified i 


| | 
(Verbal stimulus) (Pictorial stimulus) (Pictorial stimulus) 


4.3. Varieties of Pictorial Questions|Items 


A. Diagram as objective 

In this case diagram is to be drawn on, completed, modified, 
rectified, developed or labelled. Objective is to test students’ 
skill in drawing. 


1 Diagrammatic sketches (magnified or reduced in size) 
(i) Draw an electron microscope diagram of an animal cell. 


308 Handbook of Pupil Evaluation 


2. Faithful diagrams (actuals) 

(i) Draw a diagram of the experimental set-up placed before 

you on the table. 

(i) Draw a diagram of the material as you see under the 

microscope. (a slide is given for identification) 
3. Incomplete diagrams (missing parts) 

(i) In the given diagram, showing the longitudinal section of a 

bisexual typical flower, draw the missing structures. 
4. Modified diagrams (rectifying) 

(i) Four sketches representing various stages showing a mitotic 
division are given. Identify the wrongly drawn/wrongly 
labelled parts and re-draw/re-label the same correctly. 

5. Representational diagram (data-providing) 

(i) Draw a diagram showing the population trend in India 
given below: 

Year: 1947 1950 1960 1970 1980 
Population in millions: 400 450 500 550 650 
6. Outline diagrams 

(i) Draw an- outline political map of India and fill up the 
capital cities of each state. 

(ii) Draw an outline sketch showing the internal structure of a 
dicot stem as seen under the transverse section. 


B. Diagram as a means|/basis А 

Diagrams can be used for testing various skills and abilities. 
Examples for testing drawing skills have been given above. The 
following examples illustrate how diagrams can be used as а 
basis for testing of various abilities. A diagram is provided in 
each of these cases and the pupils are required to observe or 
examine the diagram/s and answer the question, based on the 
diagram/s. ` 


4.4. Ability tested 


1. Recall 
In the given diagram label the parts 1, 2, 3 and 4. 
2. Recognise А 
In the given diagram of a plant cell, label cell wall 
‘mitochondrium and chloroplast. 


Objective- Type Items (Other Varieties) 309 


3. Translate 

In the labelled floral diagram given below which part bears ` 
the same relationship to stamens as label-2 to the carpel? 
4. Cite example 

The proximal food constituents are represented in the form 
ofa diagram given opposite. This diagram is an example of a 


A. flow chart. 
*B. pie graph. (Pie graph provided) 
C. pictograph. 

D. table. 


5. Identify relationship 

Which organelle in the given diagram of a fresh water amoeba 
is concerned with osmoregulation? 
6. Compare 

Which of the following two diagrams labelled as A and B is 
that of an animal cell? (Two diagrams are provided). 


7. Classify 

Which of the leaves (4 diagrams of leaf-tips showing 
reticulate venation in 3 and parallel in one are given) does not 
belong to the same group to which the others belong? 
8. Detect errors 

Find the error if any, in the experimental set-up shown in the 
diagram for preparation of oxygen gas. 


9. Interpret 


What does this graph indicate regarding the rate of 
photosynthesis? (A graph is provided) 
10. Explain 

Observe the experimental set up for the preparation of oxygen 
gas in the laboratory (A diagram is given). Why do we not 
keep the delivery tude-A dipped in the solution like that of 
thistle funnel-B? 

11. Read instruments 
Observe the readings indicated in the three thermometers. In 
which case it is the highest and in which case it is the lowest? 
(Diagrams are provided) 
12. Analyse А 
Та the given diagram, the level of solution-Y comes down 


310 ; Handbook of Pupil Evaluation 


from A to B after 3 hours. This could be possible only if we 
assume that the concentration of solution-X is 


' Semipermeable membrane 
*A. more than solution-Y 
B. less than solution-Y 
C. equal to sólution-Y 
D. nothing to do with the fall in level. 


13. Hypothesise 
In the diagram given in item 12 above, if the solutions X 


and Y are interchanged the level in thistle funnel rises from B 
to A. When more of water is added to solution-X a stage comes 
when there is neither rise nor fall in the level. From these obser- 
vations what hypothesis would you frame to explain these 
Observations? 
14. Suggest procedures 

Identify the hypothesis underlying in the experiment shown 
above and suggest an alternative experiment to demonstrate 
that hypothesis. 


15. Reason 
In the 3 sets of diagrams X, Y, and Z showing the effect on 


the shape and turgidity of a freshly-cut potato slice is indica- 
ted. How would you account for this change? 


lo) ESSE 


Fig. X Fig. Y Fig. Z 
Empty beaker Beaker having water Beaker having 
sugar solution 


Objective-Type Items (Other Varieties) 311 


16. Infer[conclude 


Observe the two stages shown in diagrams А and B of 
plant cell given below. What inference you can draw about 
the effect on (Plasmolysis) cell content and the physiological 
process involved in it. 


FigB 


17. Predict 


In the diagram given below if this species is transferred to the 
sea and gets adapted, which of the following observations is 
most likely to be true? 


Fresh water amoeba 


A. Organelle А and B would both disappear. 

*B. Organelle A would disappear and no effect on —B. 

C. Organelle B would disappear and no effect on —A. 

D. Neither organelle—A nor organelle—B would disappear. 


5. Structured Questions 


5.1. Concept 


The nomenclature of structured questions is rather confusing. 
Different people use it differently and as such their connotation 


312 Handbook of Pupil Evaluation 


also varies from one context to the other. In different texts they 
are referred to as ‘thought questions’, ‘broken up essays’, (8) 
‘split essay’, ‘connected set of questions’, ‘subsequential sub- 
questions’, ‘passage-based questions’ etc. These questions are 
infact as old as the reading-comprehension testing whereby a 
few questions based on a given para used to be set to test the 
reading-comprehension of students. Thus they are not a novel 
variety but only a rediscovered or re-emphasised variety that 
has gained more attention of the evaluators to guage the wide- 
ranging abilities of students. With the increased emphasis on 
the development of higher-level abilities in our instructional 
programme, the need for greater use of structured questions in 
the assessment of pupils' achievement is being increasingly 
realised. It is, therefore, useful to get an insight into the form 
and functions of structured questions. 


5.2. Format 

A typical structured question is one which is rooted in a given 
or stipulated situation providing the needed premises, material, 
or data that provide the uniform content to be used as the 
medium of testing. The basic material forms the introductory 
statement that prescribes the content limits and direction to the 
examinees, The examiner desires that this introductory infor- 
mation must be used for answering the questions that follow. 
Thus the introductory statement is followed by a number of 
Sub-questions based on the subject matter forming the basis. 
An initial question is set, followed by a few more sub-questions 
in a sequential order. Each sub-question relates to a specific 
ability desired to be tested through the given item that remains 
the same for every sub-question. The diagram (Fig. 1) given on 
the next page represents the format of structured questions. 


5.3. Characteristics P 
The diagrammatic sketch given on the next page, may help us to 
derive the following characteristic features of structured questions: 


l. The same introductory material is used for every sub- 


question. 
2; Introductory material may be presented in the form of 


Objective-Type Items (Other Varieties) 313 


Basic introductory material 
based on selected theme 


Sub-questions 


Fig. 1: Format of the Structured Questions 


Initial sub-question 


Successive 
sub-questions 


a statement, a passage, table, diagram or experimental data. 
3. Each sub-question is based -on a complex problematic 
situation and not a usual textual situation. 
4. Introductory material should neither lack any material 
without which an answer is not possible, nor should it 
provide extra information which is not useable and adds 
to the reading load of students. 
The subject-matter content is based on one specific theme. 
6. Sub-questions may be structured in the form of the short 
answer, restricted (broken) essay or objective type. 
7. Usually the same form of sub-questions is used in one set 
of questions linked to the central theme. 
8. Each sub-question must demand reading of introductory 
material for answering i.e. one should not be able to answer 
a question without referring to the introductory statement. 
9. Each sub-question is independent of the other and is not 
dependent on a knowledge of the previous question. 
10. Each sub-question tests a different ability as far as possible, 
L4 


v 


^ 


314 Handbook of Pupil Evaluation 


1l. None of the questions attempts to test mere recall of 
factual information but involves testing of higher level 
abilities. 

12. Sub-questions are arranged in a predetermined particular 
order or criterion which may be the estimated difficulty 
level, abilities being tested, hierarchy of concepts involved 
or in terms of form of sub-questions, if more than one 
form of questions, is used. 

13. Depending upon the abilities to be tested, a suitable form 
of question may be used. However, objective type and 
short-answer type questions are more useable and profitable. 


5.3.1. An illustration 


Passage 

A batch of students of life sciences, while visiting a game 
sanctuary in a reserve forest saw a buck being killed and eaten 
bya tiger. Earlier they had seen a herd of bucks feeding upon 
the grass. They also saw a peacock catching a snake. Under 
the tall trees a leaf-less, golden yellow vine was found closely 
attached to the twigs of a green shrub. This vine, named 
dodder (Cuscuta) showed some spots caused by a fungus.* 

Read the above passage and answer the following questions: 
21. In the above ecosystem if the grass is the producer the 

secondary consumer is 

1. buck. 

2. 'tiger. 

3. peacock. 

4. snake. 
22. The relationship of the fungus with Cuscuta is that of 

1. hyperparasitism. 

2. parasitism. 

3. mutualism. 

4. cannibalism. 
23. The inter-relationship of grass, buck and tiger as revealed 

in the above passage is termed as 

1. flow of energy, 

2. food pyramid. 


*Source: An old N.T.S. question paper. 


Objective-Type Items (Other Varieties) 315 


3. food web. 
4. food chain. 


*.4. Useability of Structured Questions 


‘The use of structured questions depends on the purpose of 
the test, the nature of examinees, subject-matter to be tested 
and abilities to be tested. (9) We may take each factor one by 
one. 

Purpose of the test determines to some extent the nature of 
the questions. Achievement tests aim at measuring the status 
of the individual in terms of instructional objectives. A diag- 
nostic test aims at discovering inadequacies in learning and 
their causes. A predictor test, like an aptitude test, aims at 
collecting such evidences about students’ that help them to 
predict the likely future success of individuals as in tests used for 
selection purposes c.g. National Talent Search Examination of 
the NCERT. Structured questions can be used effectively in all 
the three types. However, in certain cases e.g. in a practical 
test of science the use of structured questions is not profitable. 
Since our discussion is focussed on achievement testing in which 
the language and content of a subject is assumed as a learning 
experience, the use of such questions is very much limited for 
diagnostic tests and much less in predictor tests. 

The nature of the group determines the homogeneity or 
heterogeneity of the students. In a heterogeneous group, ques- 
tion testing different abilities or questions at various levels of 
achievement are needed for which structured questions are very 
useful. The smaller groups are likely to be more homogeneous 
and therefore require questions at almost the same level. Thus, 
when large heterogeneous groups are to be tested, these questions 
are very useful. Likewise, when several groups, which are 
similar, are to be tested at different times for comparison over 
the years in the same institution, structured questions can be 
fruitfully used. 

When compared to the essay-type, the greater objectivity in 
the marking scheme is another favourable point when marking 
is to be done by a large number of examiners as in the case of 
public examinations. As structured questions demand some 


316 Handbook of Pupil Evaluation 


minimum level of reading comprehension and the writing 
ability ofa pupil, the use of such questions should be done 
carefully, keeping in view the students’ ability to comprehend 
the written material at a reasonable speed. 


The subject matter also affects the construction of such 
questions. If the question is heavily loaded with subject matter 
vis-a-vis the concepts being tested, then most of the questions 
set on that material would be deeply embedded in that content. 
They can, however, be made independent of the contenf to a 
great extent. In other words, the minimium relevant content is 
provided for equalisation of content, thereby allowing every 
candidate to answer, using the same content. Thus different 
abilities may be tested through these questions. 


Of central importance is the abilities that should be measur- 
ed through structured questions such as the pupils' ability to 
understand, apply, analyse, interpret hypothesise and evaluate, 
besides testing their creative abilities. Hudson (1973) gave the 
following example (Fig.2) in which we find an introductory 
statement followed by sub-questions. Each question is set on 
the higher level intellectual abilities, of course assuming the 
basic knowledge and concepts regarded as pre-requisite to the 
answering of questions. Each question is an independent 
measure and is not dependent on any other question, thereby 
measuring independent abilities (10). 


Structured Question -—- Biology 


The production of heat as a by-product of metabolic processes 
-may be demonstrated using apparatus as shown in {һе diagram 


on the next page. 
Soaked live pea seeds are placed in one flask and an equal 


quantity of soaked peas which have been killed by steeping in 
mercury (II) chloride solution are placed in the other. The 
rise in temperature caused by metabolic activity of the live peas 
causes an expansion of the air in that side of the apparatus and 


thus a pressure difference which is recorded by the water 
manometre tube. 


"У 


Objective-Type Items (Other Varieties) 317 


Insulated flasks 


Fig. 2 
Source: Assessment Techniques—An Introduction, 
B. Hudson, pp. 100-107. 


Several different people setting up this experiment proceeded 
in different ways. 

(i) One placed equal numbers of peas in each flask, another 
equal weights and a third equal volumes. What are the 
disadvantages of each of the three procedures? 

(ii) One person used metal insulated flasks while another used 
clear glass insulated flasks. What differences would you 
expect in the results and why? 

(її) One person set up the apparatus with the tubes X each 
approximately 100 mm in length while another used tubes 
over 500 mm in length. Which person would be likely to 
obtain the more accurate results and why? 

(iv) One person placed only a small quantity of peas in each 

flask while another almost filled each flask with peas. 

Which factors should be considered in this experiment in 

determining the most appropriate quantity to be used? 

Even given ideally matched quantities of pea seed and the 

Most appropriate form of flasks and tubing there are 

factors other than thermal expansion which could affect 


the volume of air on each side. Describe one such factor 
and its probable effect. 


(vi) This experiment can be used as а qualitative demonst- 
ration but could not be used in any quantitative investi- 


(v, 


— 


318 Handbook of Pupil. Evaluation 


gation. Explain which features of the apparatus and 
procedure prevent this. 

It is observable that all the questions, given in the example, 
test the higher intellectual abilities such as analysis, interpreta- 
tion, reasoning, cause-effect relationship, inference, prediction 
etc. All the six questions lend themselves to relatively objective 
marking and each question is independent of the other, thereby 
making assessment more reliable. From this illustration, it is 
quite obvious that structured questions can be made use of 
quite profitably for testing varied intellectual abilities. 

When to use such questions and when not to use would 
therefore depend on the purpose of the test, the nature of the 
subject-matter, the nature of the group and the abilities to be 
tested. 


5.5. Presentation 


When the series of structured questions are to be constructed, 
one can take advantage of the type of stimulus presented and 
the possible reception sense like sight, hearing, touch/taste and 
smell. Stimulus can be presented in the form of words, 
apparatus material, tables, graphs, numbers, symbols, pictures, 
diagrams etc. If all questions are presented in words, of course 
with sight as the reception sense, then the pupils might feel 
bored. In case of long tests, it is advisable. to present the 
material in a varied manner to cater to pupils’ interest. This 
would also ensure a more accurate measure of students’ abilities. 
Comprehension of material in the least possible time with the 
minimum effort, is what is required of a good introductory 
statement. The material may, therefore, be presented through a 
diagram. The introductory statement must be phrased in simple, 
comprehensible language. This ensures more reliability by 
attributing the performance of students to their developed 
abilities rather than being contaminated by their language 
ability to comprehend. Real, authentic and accurate descrip- 
tions should be utilised, rather than fictitious data for which a 
variety of source material may be investigated. Use of a familiar 
diagram, stereotyped and difficult language would spoil the 


‘Sm 


Objective-Type Items (Other Varieties) 319 


quality of the structured questions. Concise, comprehensible 
and accurate content presented in an appropriate form are the 
pre-requisites to a good introductory statement. 


6. Marking Schemes 


As in any other type of questions, it is always advisable to write 
the answer at the same time as the preparation of the question. 
In fact, it is better if a complete answer is written as is expected 
from the examinee. This helps to obviate various types of flaws 
and inadequacies in the answers and also helps in chiselling 
out the question itself to arrive at the intended answer. Al- 
though complete objectivity of marking requires that all 
possible correct answer should be pre-determined and written in 
the marking scheme, it is, however, not any easy task, especially 
when open-ended questions are used, which allow freedom of 
Tesponse to some extent. If the sub-questions are properly 
Structured, keeping in view the ability to be tested and аге pin- 
Pointed using definite language, it is possible to write a precise 
and objective marking scheme. It is not possible in case of the 
essay-type and such short-answer type questions that allow more 
freedom of response on the part of examinees. 

Tf alternative answers are possible to the questions, then the 
question itself should be re-examined and revised. Likewise, if 
the credit points in a particular answer are very many, then that 
sub-question may be reviewed to make it more pinpointed or 
it may be split up into two or more separate sub-questions. A 
good structured question may require not more than one, two 
or three marks for each sub-question. 

The marking scheme may be a personal marking scheme 

: When it is to be used by the individual teacher for students in 
his own class or it may be a marking scheme which may be used 
by more than one examiner as in the case of a public examina- 
tion. In both the cases, it has to be worked out very carefully 
and special care may be taken in the case of the personal mark- 
ing scheme because it will not be discussed with other colleagues 
for its finalisation to make possible amendments. 

While developing the marking scheme, care may be taken 
that each value point may be given proportionate weightage 


320 Handbook of Pupil Evaluation 


out of the total allocated marks, say 10. The total for these 
marks should invariably lead to exact marks allocated for that 
question. 

There are some questions where objectivity in marking is 
not really possible and are in fact marked entirely on a sub- 
jective basis. It is only in structured questions that the writer 
can frame questions and develop the marking scheme ina 
manner that minimises subjectivity in marking. In spite of this, 
the examiner may find an anwer which may be correct accord- 
ing to him although that answer may not be indicated as the 
intended answer in the marking scheme. In such cases, marks 
may be allotted by the examiner in terms of the intended 
answer. 

Head (1967) showed that in such cases the rank order of the 
pupils taking a particular test is hardly affected even though 
the range of total marks awarded may differ significantly when 
the same answers are marked strictly according to a scheme 
which is inflexible.(11) Marking is sometimes a subjective process, 
but it should not be based entirely on personal judgement. A 
good marking scheme should however, indicate criteria of 
judgment alongwith allocation of marks for each criterion as 
clearly as possible. 


Once the marking scheme is written, one may like to forget 
it for some time and keep it away. Since the first draft of a 
question is far from being perfect, the marking scheme can be 
reviewed after a few days. It is better if reviewing and editing of 
questions is done by more than one independent persons who are 
well conversant with the item-writing technology besides posses- 
sing mastery of the subject. 


7. Limitations 


In spite of the fact that structured questions find a prominent 
place т today’s literature and are even used in selection tests, 
they have some inherent limitations. These questions are very 
difficult. to construct for two reasons. First, it is really a tough 
job to develop on introductory passage which could really 
Provide the needed base material and is at the same time not a 


Objective-Type Items (Other Varieties) 321 


simply replication of textual material. Secondly, all the sub- 
questions related to the main should also not only test the 
higher level abilities but should also not overlap and interlock. 
It is thus a difficult task to frame such questions. 

These questions demand a lot of time in structuring—good 
questions that serve the intended objective and at the same time 
are economical from the point of view of testing. 

They involve a lot of reading load on the part of the exami- 
nees and, therefore, quite a good deal of time is wasted in 
reading and re-reading the passage in order to answer the 
questions that are set on the passage. For every question the 
examinee has to go through the passage, again and again. 

Besides, these questions are not economical in the sense 
that for testing one major theme or central issue, one has to 
utilise three to five questions. Moreover, when questions are 
used as a part of the annual public examination, one is forced 
to restrict the content to only one unit and therefore, propor- 
tionate emphasis on different topics may not be ensured. As 
such the content-validity of a test in which too many structured 
questions are ued, is likely to be reduced. 

Since these question involve a lot of students’ ability to 
comprehend, it presupposes a certain level of comprehension 
on the part of students. If they have not developed a minimum 
level of comprehension such questions cannot be used. There- 
fore, their use at lower stages, like the primary stage, becomes 
impossible. 


8. To Sum Up 


From the foregoing discussion on structured questions, it can 
be inferred that these questions can be used only for certain 
purposes for pupils of a particular maturity level. These ques- 
tions are -very useful when some complex problem or a major 
theme is to be thoroughly investigated and demands the inter- 
play of various higher level intellectual processes that provide 
evidence about the student's command of certain specific con- 
cepts and principles. 

Their use at the unit-testing level cannot be over- emphasised 
because of their focus оп testing different abilities ranging from 


322 Handbook of Pupil Evaluation 


the simple to the complex. However, their use in selection tests 
can be questioned on the basis ofthe time-requirement per 
item. Structured questions presently in vogue, if analysed, 
would reveal that more that 50% of the sub-questions can be 
answered, irrespective of the basic introductory material or the 
Passage provided. This indicates the inherent difficulty of 
framing independent test-item testing different abilities based 
onthe given passage. Their use in classroom testing is quite 
diagnostic to discover students' inabilities. Moreover, their 
use in the form of oral questions based on a particular diagram 
or data which can be presented to the class оп the blackboard 
can be of much use. Thus, structured questions are not econo- 
mical to use "unless one attempts to test the higher level 
abilities. Nevertheless, these questions provide a good challenge 
to the item writers as wellas to the teachers who can make use 
of these questions effectively after having acquired the technical 
knowhow of their construction as well as the usability of such 
questions. 


—— M M— M €—— AN" 


СНАРТЕК ХП 


QUALITIES OF А GOOD MEASURING 
INSTRUMENT 


Wlienever we measure students’ achievement by using a teacher- 
made test or a standardised test we use the data for making 
decisions on the assumption that the measuring instrument is of 
good quality. In the case of standardised tests, the manual of 
such tests gives the coefficient of validity and that of reliability 
but in most ofthe teacher-made tests, the results of examina- 
tions are used without taking into cognizance the quality of the 
instrument used. In fact most class teachers are not conversant 
with the characteristic of a good examination. Once they appre- 
ciate the need for understanding the characteristics of a good 
achievement test, they would be able to interpret test results 
more scientifically. After going through this Chapter the 
reader would be able to: 


(a) define usability, reliability and validity оғ measuring 
instruments in his own words, 

(b) identify relationship between reliability and validity with 
Tespect to the nature and scope of each and factors affecting 
them, 

(c) classify various sources of variance relating to individual, 
administration and chance factors, 

(4) estimate reliability coefficients using different methods and 
understand their limitations in usability of each method, for 
interpretation of test results, 


(324 Handbook of Pupil Evaluation 


(e) define, describe and classify various types of validity and 
the implications of each for test construction, and 

(f) apply the concept of reliability and validity in judging the 
quality of the measuring instruments. 


The three major requirements of a good test in this chapter 
are validity, reliability and usability. Each of these is discussed 
in this chapter. Since validity is comparatively a more compre- 
hensive concept than reliability and reliability is more important 
concept than usability I would prefer to start discussion of these 
concepts in the reverse order. 


1. Usability 


\A tést is said to be usable if itis esay to construct, easy to 
administer and easy to interpret.) An essay test is easier to cons- 
truct and administer than a test of multiple-choice items but it is 
more difficult to interpret than a test consisting of multiple-choice 
items. ( Administrability of evaluation refers to the ease and 
accuracy with which the evaluators and the examinees can 
follow the directions.)The requirement of directions vary from 
test to test. Morevever, if the test is too time-consuming to 
administer, like the Stanford-Binet-Intelligence Test, its usability 
is restricted. This test is administered individually and requires 
about 1 hour for the examinee and half an hour for the exa- 
miner. The directions are so complex and detailed that it 
Tequires one semester course to learn to administer and score 
this test. Essay-type tests are easy to administer but administra- 
tion of the objective test requires more paper vis-a-vis expendi- 
ture and has to be administered separately with a separate time 
limit. Therefore, it has to be distributed and collected separately 
with a number of instructions to the examinees. Tests involving 
sub-tests requiring separate time limit for each sub-test are more 
susceptible to errors in timing and require the administrator of 
the test to have a time-piece during administration. 

Some tests require no supervision. Oral instructio 
vary from room to room if these are not given in the tes 
orare read from the paper. When given in the test itself, they 
may be separated from the test material so as to avoid distrac- 


ns may 
t itself 


Qualities of a Good Measuring Instrument 325 


tion. It is better to fit the length of the test to the classroom 
period. In case ofthe standardised test, the cost of the test is 


` another factor besides the format of the test which may facilitate 


the mode of responding to the test questions and/or writing the 
responses at a specified place designated for the responses. 
Tick-marking or encircling a true-false, item against each state- 
ment iseasier if the blank space or the word T and F are pro- 
vided on the right instead of to the left of each statement. 

Using a separate answer sheet in case of the objective type- 
test enhances the usability of the test as it becomes easier to 
analyse the result besides possibility of reusing the test 
material. The cost of the analysability of the scores, the com- 
plexity of the operation, the provision of a scheme for marking 
and summation of responses, do reflect on the administerability 
of the test. ж 

| Interpretability refers to the degree to which the scores of a 
test are readily derived and understood.) Applying a scoring key 
to get raw scores, listing of credit or value points in the mark- 
ing scheme, use of correct formulae, machine or manual scoring, 
arrangement of items in the test in terms of objectives, content, 
form of questions, difficulty level (estimated) etc. have a bearing 
on the interpretability of scores/ To give meaning to the raw 
score, the use of the group-norm ór the age-norm would help in 
interpretation. ‘The use of a referent i.e. self, group or criterion 
is another variable that will determine the mode of interpreta- 
tion. Should we estimate or predict? Should we compare with 
the norm to look for deviation of students’ score from the norm 
or should we compare with the predetermined criteria of expec- 
ted performance in terms of mastery of learning? Are we inter- 
ested in grading students, providing remedial instruction or 
judging instructional efficiency, These are some of the cogent 
factors that make the scores meaningful for interpretation 
purpose. ; 

( Thus usability involves judging of distractions, distributions, 
of papers, carrying on instructions, timing, supervision, ending 
of the test, making notations, Security, malpractices, etc. for 
administration of a test) The order of Scoring, rescoring, mark- 
ing schemes, scoring devices, keys, machine and hand scoring, 
recording, tabulating, summarising, and profiles are relevant to 


326 Handobok of Pupil Evaluation. 


the scorability of the test. Precautions like rapport between the 
examiner and the examinees, preparation of examinees, obtain- 
ing material administration of tests to pupils, scoring of tests 
and the use of test results аге the different phases of a testing 
programme which have to direct or indirect bearing on the 
usability of the test. 


2. Reliability 


The more important universal criterion of a good measuring. 
device is the reliability. Whereas: validity deals with what to 
measure, reliability is concerned with how efficiently does it 
measure. The former deals with relevance of the purpose, objec- 
tive or the intended outcome, while the latter concerns itself 
with the accuracy and consistency of measurement. Reliability 
is an essential part of validity. Both relevance and reliability 
constitute validity. Therefore, validity to a certain extent 
depends on the reliability of the test but the vice-versa is not 
true. А test may be totally invalid, yet reliable because of its 
consistency and accuracy of measurement. A test must be relia- 
ble in order to be valid. 
Whenever any measurement is made, it has to be repeated 
once more or a number of times to be sure that our measure- 
ment is accurate and precise. But it is seldom that we find, on 
repitition, the measurement to be the same. There is always. 
some variation, although the extent of variation would vary 
depending upon the precision of the instrument, the user of the 
instrument and many other factors. 
I poured three bottles of petrol i 
in a month but the mileage was not a 1 
Кт, 31 Кт, 32 Кш, 33 Кт, апа 32 Кт, рег litre. I kept a 
record for a month. As the traffic and the weather conditions 
were the same throughout the month there should have been no 
change in the mileage. Had there been a constant mileage of 
. 32 Km, I would have been satisfied that it was the true mileage. 
But why did all the five measures vary, although the true mileage 
(32 Km) did not vary? This may have been due to chance errors 
which might have crept into due to one or more of the following 


reasons: 


nto my scooter five times 
Iways the same. It was 32 


— 


= 


Qualities of a Good Measuring Instrument 327 


(a) Three litres of petrol may not be exactly three litres indicat- 
ed at the petrol pump; it may be a little more or a little 
less. 

(b) Over or under-estimation of the amount due to reading;of 
the indicator that shows only I/10th of the division of each 
litre. 

(c) The milometer might not have registered accurately the 
distance travelled. 

(d) The reading of the meter may not have been accurate every 
time, sometimes the nearest digit was rounded off to the 
next and sometimes to the last digit. 

(e) Difference due to observation of the level of the tank, by 
the attendant etc. etc. 


There may be many more such factors. Still we find that the 
mileage averaged was fairly constant. Such a consistency is 
desirable in our educational measurement. A good measuring 
instrument should yield dependable information or scores when 
used repeatedly in unchanging situations. 


2.1. Sources of Variance 


Scores on a test do not remain constant when the same or 
similar tests are repeated. This is due to systematic or unsys- 
tematic variations caused by systematic or unsystematic factors. 
Systematic variations are characterised by some trend, orderly 
pattern, regular increase or decrease in scores, age, sex, training, 
growth, etc. On the other hand factor like fatigue, seating arran- 
gements, disturbance or noise, poor flow of the pen etc. are 
random in character and cause changes which vary from moment 
to moment, thus causing unsystematic variations. 

The various sources of variance categorised are given on 
the next page. 


2.2. Definitions 


1. By reliability we mean the consistency of the measurement 
of a particular achicvement from time to time or from test to 
test (Ebel) (1) 


328 Handbook of Pupil Evaluation 
Sources of Variance (after Lindquist) (2) 
; | 
у | у 
M Ж 
Individual Administration Chance 
| | Luck in choice 
| of response by 
! examinee 
| Y 
| S. р 
| Testing conditions Scoring 
| 1. Adherence to time limit 1. Bias 
| 2. Distractions 2. Halo effect 
| 3. Clarity of instructions 3. Generosity error 
| 4. Invigilation 4. Penalty error 
| 5. Practice items etc. 5. Effect 
| 6. Bias in subjective rating 6. Factors etc. 
| | 1 у 
Y Y v 
Lasting & Lasting & Temporary Temporary 
general specific but general & specific 
characteristics characteristics characteristics characteristics 
1. Level of 1. Ability on І. Fatigue 1. Comprehen- 
ability particular 2. Light sion of spe- 
2. Test-wise- traits tested 3. General test- cific task 
ness in a test wiseness 2. Specific tricks 


3. Ability to 2. Skills in 4. Motivation 


uU 


comprehend ^ particular 5. Mechanics 


instructions form of of testing 
test items etc. 
3. Chance in 6. External 
particular conditions, 4. 
test items like heat, 
light, sound 5. 
. Reliability... - -is simply the extent of 


or techniques 
dealing with 
a test 


. Level of 


practice 
Fluctuation 
in memory 
Unpredicta- 
ble fluctua- 
tion in atten- 
tion 


unsystematic 


variation in the quantitative descriptions of the amount of 
some trait an individual possesses or manifests when that is 


measured a number of times (Ghiselli) 


(3) 


. Reliability of any set of measurements is logically defined as 


the proportion of their variances, that is, true variance 


(Guilford) 


(4) 


è 


Ü 


Qualities of a Good Measuring Instrument 329 


14. Reliability concerns the precision of measurement regardless 


of what is measured. Some random error is involved in all 


scientific measurements....---.+-++-- Psychological measures 
also contain a portion of random error analogous to that of 
the mental ruler (Nunally) (5) 


5. “Reliability concerns the accuracy and consistency with 
which it measures whatever it does measure in the group 
with which it is used. To be valid—that is, to serve its 
purpose adequately, a test must measure something with 
reasonably high reliability, and that something must be fairly 
closely related to the function it is used to measure 
(Thorndike) (6) 


2.3. Why Scores Needs to be Known? 


‘Scores need to be known for comparison among individuals— 


assignment of individuals to groups— prediction of other types 
of behaviour from one score—comparison of traits and abilities 


cof an individual as in guidance or assessment of effects of sys- 
-tematic factors on an individual's performance, i.e. for dependa- 


ble deductions concerning test properties. Usual test practices 


‘based on the assumption that all examinees are predictable. 


2.4. What is Reliability? 


A test is for obtaining a score for an individual. Hence what to 


‘measure is assumed. The next question is, how to measure? 


How will it measure accuracy of measurement or precision of 


-measurement? The accuracy with which a score: represents the 


status of an individual in whatever aspect the test measures him. 
The ideal test tells the truth consistently. Hence the score 
should remain the same over repeated measurement, or if ano- 


ther similar test is administered, the same score should be 
obtained. 


Reliability and validity are different but interdependent 


.concepts. Validity pertains to what a test measures; reliability 


pertains to how it measures. Validity pertains to relevance; 
reliability refers to accuracy. Validity is known by agreements 


with an external criterion; reliability is known by an internal 


criterion. Consistency or stability are other words which des- 
«cribe reliability. 


330 Handbook of Pupil Evaluation 
2.5. Why Do Scores Change? 


Scores do not remain constant over repeated administrations or 
for similar tests. Scores change due to systematic factors and 
unsystematic factors. Systematic factors cause systematic varia- 
tion characterised by an orderly progression or pattern, with the 
Scores obtained by an individual changing from one occasion to 


the next in some trend. Systematic changes appear as a regular 
increase or decrease in scores, or they may follow some cycle. 


Sex, age, or place of birth are some Systematic factors. Learn- 
ing, training, growth are some other factors which show Syste- 
matic variation. Е 

Unsystematic factors like environmental, acquired ог 
native determinees of behaviour produce unsystematic 
variations, which are random in character. The changes 
may be from moment to moment. Some examples are fluctua- 
tions in attention, inconsistent or a bulky pen, lighting, seating, 
ill health, etc. 

Reliability of measurement is the extent of unsystematic 
variation in the quantitative descriptions of the amount of some: 
trait ап individual possesses or manifests when that trait is 
measured a number of times. 


If in a series of parallel tests, the standard deviation of 
Scores for a single individual is zero, then the reliability of 
measurement is perfect. If this is small, reliability is small. This 
is the standard error of measurement. It is the standard devia- 
tion of scores of one individual over a series of parallel tests. 
If the SEM is 3, which means for 68 per cent cases, the scores 
may-vary by 3 points. Thus this index gives an indication of 
the absolute accuracy of test scores. 


Another way to describe it is to say how well scores for a 
group of individuals remain consistent with two parallel tests. 
This is called the reliability coefficient. It is the index of the: 
extent to which scores on any one parallel test can predict scores- 


Оп any other. Operationally, it is a kind of self-correlation of a: 
test, 


Qualities of a Good Measuring Instrument 33I 
2.6. True Scores and Error Scores 


Unsystematic factors contribute to the scores we get, in some 
degree or other. Hence the score we get for an individual can 
be described as: 

Obtained ѕсоге= True score + Error Score 

(which may be positive or negative) 
This may be expressed as: 

X, =X ~ + Хе 
where: 

X,=obtained score 

X „а —true score 
e=error component 


Three assumptions underlying the above are: 


(a) The individual possesses stable characteristics or traits that 
persist through time. 
(b) Errors are completely random, i.e., error scores are uncorre- 


lated with scores on any other variable, and with true 
Scores. 


(c) Obtained scores are the result of the addition of true and 
error scores. 


“The existence of error scores leading to error variance, 
corresponds to the fact of unreliability, and its amount relative 
to the total of all variance is a measure of the degree of 
unreliability.” (Lindquist) 

If c? o is the variance of the true score, c?, is the variance of 
error of measurement and magnitude of error of measurement 
is unrelated to the magnitude of true score then: 

в?, = 62 6 + в% 
(чш variance of variance of ) y 
variance true scores error scores f 

These variance can be related to reliability coefficient by 
means of the following equation: 

e HATTE У в? т 

Reliability Coefficient Iy—— 3 (1) 

t 
Therefore, reliability is the proportion of true variance in. 


332 —' Handbook of Pupil Evaluation 


obtained test score. ares 
Since 02 == 02: —o?, we can substitute the difference in this to 


c?, 
get г,=1 xin 


Since c. and c; are not known, other formulas have been 
devised, for estimating true variance. 
The relationship between standard error of measurement 
and reliability coefficient is expressed by the equation 
Standard error of measurement or 
(or standard error of obtained score) 
SEM=V/i-r OR в,=вс\у1—г‹ 


Reliability Coefficient from Item Statistics 

ти depends upon how the split halves have been obtained. 
Kuder-Richardson formulas depend upon item statistics. Each 
item is to be considered as a separate test. Hence all items are 


to be obtained of equal difficulty and inter-correlation. The 
formula is 


[ff п \fo%—Zpq 
ЕЕС о 


where n=number of items in the test. 


p— proportion of correct responses to each item 
q—1—p 
The reliability is obtained on a single administration of 
test. 


Reliability from Analysis of Variance к 
In this, as in the above method, arbitrary methods of choosing 
the particular two halves are dispensed with. All the informa- 
tion about consistency of performance from item to item within 
the test is made use of to obtain an estimate of internal consis- 
tency (See Lindquist, p. 590). The analysis of variance approach 
is more useful for obtaining reliability estimates from items or 
trials scored with a range of Scores. 


Qualities of a Good Meusuring Instrument 333 


2.7. Factors Affecting Reliability 


2.7.1. Factors associated with the test itself 


(i) Item homogeneity 

If the items are homogeneous with the test, higher intercorrela- 
tions, and therefore higher reliability, is ensured. A perfectly 
homogeneous test will have the highest reliability but the 
validity of the test will suffer. K-R formulas depend on. the 
complete homogeneity of function from item to item. 


(ii) Item difficulty 

The reliability of a test is the highest if the items aré of median 
difficulty, or have a difficulty level of 0.5 (in Ebel, Guilford). 
When the items vary widely, a smaller reliability coefficient is 
obtained. Hence in short tests, items of the difficulty level 0.5 
can be pooled. In longer tests, some spread of item difficulty 
is desirable. It is better to have a rectangular distribution as is 
achieved in such a test (Guilford). In the case of multiple- 
choice items where chance makes it easier to get higher scores, 


the difficulty level of items may be greater than 0.5, say 0.6 
(Guilford). 


(iii) Type of items 
Higher reliability will be obtained in the case 


с of multiple-choice 
type items. 


(iv) Лет» based on memory 


If items are based on memory, reliability will not be high as 
compared to items testing higher objectives. 


(v) Length of the test 
The reliability of the test increases with increase in the number 
of items, or the length of the test. The items added should 
resemble the old ones, in form and content, and should have an 
equal difficulty level and equal intercorrelation with one another 
and with the old items, 
(This amounts to combining two parallel forms) (See Guil- 
ford, p. 352). Both the K.R. formula and the Spearman-Brown 
Prophesy formula suggests why this happens. 


334 Handbook of Pupil Evaluation 


From Spearman-Brown formula 


c nT; 
7 по Una (8) 


Tn 


“Ta is the reliability of test in items as long as a shorter test of 
known reliability r, is equal to n times the reliability of the 
Shorter test, divided by (n—1) times the reliability of the 
shorter test plus 1.” ` 

It is possible to calculate the number of times a test has to 
be lengthened to get the required reliability coefficient. (See 
‘Ebel, p. 315). If the initial Teliability is lower, the first increase 

in length will result in a larger gain than after successive 
increases in length. If the length has to be increased by many 
times the original size to get optimum reliability, other methods 
have to be adopted. 

In the case of multiple-choice items, when the number of 
Tesponse options is increased, the reliability of the test also 
increases as shown by Remmers and others (see Lindquist). 
Thorndike points out that a multiple-choice item behaves as if 
it were composed of k—1 two-choice items where К varies 
from 2 to 5. 

In the case of subjectively scored items, such as essay-type 
questions, increasing the number of raters or examiners has the 
same effect as that of increasing the length of the test. 

In many tests, the length may have to be decreased to 
increase the reliability. Discarding poor items will achieve this 
Purpose. The formula is useful in another way. It enables us to 
see if we can safely shorten a test without appreciably lowering 
the reliability. (Guilford, F.S., p. 466). 


2.7.2. Factors associated with the group 


Range of ability 

Test reliability depends upon the range of ability of the popula- 
tion sample taken for administration of the test. If the sample 
is heterogeneous, the reliability coefficient will be higher. If the 
Tange of scores is narrower, the reliability coefficient will be 
Smaller, This is depicted in the following diagram given’ by 
Guilford. с (9) 


Qualities of a Good Measuring Instrument 335 


ү 


, 

Du Ss Restricted 

Nim <> range 
Complete range 


Complete range. Correlation for the cases within the smaller rectangle 
is much smaller than that for cases within the larger rectangle 


Hence reliability should be found out from administration 
over a sample of high range of scores or greater variability. 

Reliabilities of tests reported in manuals have to be eva- 
luated against this background. If the test had been 
administered to children from one particular grade, the 
reliability reported is generally on the lower side, whereas relia- 
bility got from administration to a single age-group is on the 
higher side. (See Thorndike and Hagen, p. 186) (See Guilford, 
p. 392 for a formula for estimating the reliability of a test when 
used in a population of one SD from knowledge of its reliability 
in a population with a different SD). 

A test if very difficult or very easy will not have a high 
reliability. If difficult, it leads to guessing, and hence low 
reliability. If easy, very low variability, and hence poor 
reliability. The test may have different reliabilities at different 
score levels. If the sample is too small, there will be less 
variability, and hence the reliability will be low. 


2.1.3. Factors associated wiih test administration 


(a) Provision of instructions in the test 
(b) Provision of practice exercises 
(c) Accuracy of timing 
(d) Sufficiency of time 
(e) Standardization of motivation 
(f) Conditions of test taking 


336 | Handbook of Pupil Evaluation 


(g) Time between trials 

(h) Scoring methods and subjectivity of scoring 

(i) Method employed for computation of reliability coefficient 
by which split-halves are obtained or parallel forms are 
got. 


2.8. Methods of Estimating Reliability 


At least two sets of measurements have to be obtained for the 
purpose. The correlation coefficient is obtained between the 
two sets of scores so obtained. 


(i) Test-retest method or repeated testing method 
There are certain definite advantage in this method. Only one 
set of sample tasks or items is needed. This holds constant in 
both the applications and hence no variance attributable to 
changed task is introduced. The two applications may be done 
with some interval or without any interval of time. If there is 
no interval of time, the carry-over effect from one testing 
occasion to another will be present (the memory factor). If the 
interval is large, it is eliminated but changes in the individual 
take place. The problem of getting the same group is great. 
The attitude of the examinees also may be a new factor that 
comes into play. ч 
Hence, in measures of intellect, temperament, or achieve- 
ment, where a large number of tasks or items forming the 
universe are used, the test-retest method is not defensible. (10) 


Test-retest is usually higher than that based upon parallel 
forms. 


(ii) Parallel forms or equivalent forms 
This method gives the coefficient of equivalence and is consider- 
€d a preferred procedure for estimating reliability. It goes well 
with the theory of the true and error score and the eclectic 
Concept of true scores and parallel tests (see Ghiselli, p. 281). 
What are parallel test forms? Each form should be built to 
the specifications, and consist of random samples of items. It 
Should have same distribution of item difficulties and same 
distribution of item-test correlations, The Thorndike equivalent 


Qualities of a Good Measuring Instrument 337 


forms are tests having identical true variance and no overlap of 
error variance. (See Ghiselli for distinction between parallel 
forms and parallel tests, p. 279). | 

If parallel forms are administered one after the „other with- 
out interval, the reliability resembles the split-half reliability or 
internal consistency reliability. If they are administered with a 
long interval, the reliability resembles test-retest reliability. 

The advantage is that the error due to sampling of particular 
items in the test is taken care of. There is no specific c 
effect due to memory. The disadvantage is the change 
tent—a source of error variance. The difficulty of buildi 
than one form is another disadvantage. The parallel-form 
reliability is usually an under-estimation of reliability due to 


this factor. This method is particularly suitable for speed 
tests. (See Ghiselli, p. 222). 


аггу-оусг 
in con- 
ng more 


of the test. There 
alf lengths: (a) two- 
(b) alternate items 
most frequently used 
der of difficulty, (c) 
О halves, (d) the first 
forming the second 


; each part). 
The correlati 


f on coefficient obta 
lity for half t 


ined in such cas 
Spearman-Brow 
) (or any one 


es is reliabi- 
n's Prophesy 
of the other 


338 Handbook of Pupil Evaluation 


which is mostly systematic. Successive items are quite alike 
also. It gives a misleadingly high reliability coefficient. This 
reliability is a function of the choice of the split. 


Sources of variation represented in different procedures for estimating 
reliability* 


Sources of | Immediate Retest Parallel Parallel Odd even K.R. 


variation retest, after test test halves of analysis 
К ѕате inter- form form single single 
test val without with test test 
same time time 
test interval interval 


How much the score can be expected to fluctuate 
owing to: 
(a) Variations x х х х x x 
arising 
within the 
measure- 
ment 
procedure 
itself 
(b) Changes — x - x – = 
in the 
individual 
from day 
to day 
(с) Changes — — x x x x 
in the 
specific 
sample 
of tasks 
(d) Changes x x x x = = 
in the : 
individuals 
speed of 
work 


*Source: "Measurement and Evaluation in Psychology and Education" by 
Thorndike and Hagen (1969), p. 186. 


Qualities of a Good Measuring Instrument 339 
2.9. Interpretation of Test Reliability 


Standard Error of Measurement 
Since the reliability coefficient is dependent upon the variability 
of the group and not merely on the quality of the test, the 
standard error of measurement is used sometimes in place of the 
reliability coefficient. It can be used even when a group differs 
from the original group. 

*Standard Error of Measurement (SEM) is equal to the 
Standard deviation S of the test times the square root of 1 
minus the reliability of the test." (11) 


SEM = 5 V/I- Tiez 
Standard Standard Reliability 
error of deviation coefficient 
measure- ofthe 

ment test 


It provides an indication of the absolute accuracy of the test 
scores. 

Interpreting: If the standard error of measurement 183, 
then for 68 per cent cases, the scores will be 3 points more or 
less. 


How high should reliability be? 


For evaluating reliability coefficient, the following data are 
needed: 


(a) the particular set of operations by which consistency of 
performance was obtained, 


(b) the description of the group tested, 
(c) statistical characteristics of the group, and 
(d) adequacy of sampling. 


There is no hard-and-fast rule as to how high reliability 
should ђе. It can be low for research purposes (Guilford, 


340 Handbook of Pupil Evaluation 


P- 389). More crucial is the validity of the test. For practical 
Purposes of diagnosis and prediction it should be higher, even 
at the cost of validity. It is possible to have a reliability co- 
‘efficient of 0.90 for good objective-type tests but 0.80 is con- 
sidered a standard (Ebel, p. 330). 


3. Validity 


`3.1. Some Definitions 


(i) Lindquist (12) 

“The validity of a test may be defined as the accuracy with 
which it. measures that which it is intended to measure, or the: 
degree to which it approaches infallibility in measuring what it 
purports to measure." According to this definition, to deter- 


mine how valid a test is, one must compare the reality of what ` 


it DOES measure with. some ideal conception of what it 
OUGHT to measure. 


(1) Cureton (13) 

“The validity of a test 15 an estimate of the correlation between 
the raw test scores and the ‘true’ (i.e. perfectly reliable) criterion 
Scores". This definition does not answer the question, "What 
should the test measure", Criterion scores that measures exactly 
the same thing as the test is intended to measure are seldom. 
available, even in an unreliable form for most of the classroom 
tests of educational achievement. 


(iii) Edgerton (1949) 
*By validity we refer to the смет to which the measuring 


device is useful for a given purpose." 


(iv) Chronbach (1960) 
“Тһе more fully and confi 
greater its validity." 


dently a test can. be interpreted, the 


(у) Guilford (1946) 
“In'a very general sense; a test is valid for — wi 


it correlates." 


ith which 


Qualities of a Good Measuring Instrument 341 


. (vi) Websters New Collegiate Dictionary (1951, p. 940) 
“А test or anything else, is valid if it is ‘Founded’ on truth or 
fact; capable of being verified, supported or defended; well 
grounded, sound." 


(vii) Gulliksen (1950) 
“Тһе validity of a test is the correlation ofthe test with some 
criterion." 


(viii) English and English (14) 

“A property of the whole measuring or testing process but 
specially of the testing instrument that ensures that the obtain- 
ed test scores correctly measures the variables they аге suppos- 
ed to measure. Validity is always validity for measurement of a 
particular variable. There is no such thing as general validity, 
nor is there any absolute validity —we determine the degree of 
validity and the validity index has no meaning apart from the 
particular operations by which it is determined.” 

From the above mentioned definitions, we can infer that 
evaluators differ in their concept of validity. Some specialists 
specify correlations with a criterion as Gulliksen. Some require 
estimation of corrected coefficient as Cureton. Others emphasise 
accuracy in relation to user’s intent. Still others relate it to 
utility and interpretability of test scores. It, therefore, seems 
difficult to arrive at a common conceptual definition and even 
if one is formulated it may be so abstract that no useful pur- 
pose would be served. In fact these is need for operational 
definition of the term validity. 


3.2. Types of Validity 


(i) Concurrent validity 

Validity is concerned with the relation of test scores to an 
accepted contemporary criterion of performance on the variable 
which the test is intended to measure—e.g. correlation of raw 
Scoré on test with teachers' ranking or on any standardized 
tests. 


(ii) Construct validity 
Validity is concerned with "what psychological qualities à test 


342 Handbook of Pupil Evaluation 


measures" and is evaluated “Ъу demonstrating that certain ex- 
planatory constructs account to some degree for performance on 
the tes.t”’ 


(ii) Content validity 
Validity is concerned with the adequacy of sampling of a 
Specified universe of content. 


(iv) Curricular validity 

Validity is determined by examining the content of the test 
itself and judging the: degree to which it is a.true measure of 
the important objectives of the course, or a truly representa- 
tive sampling of the essential materials of instruction. 


(v) Empirical validity 
Validity refers to the relation between test scores and criterion 
Scores, the latter being an independent and direct measure of 


that which the test is designed to predict. 


(vi) Face validity 
Validity refers, not to what a test necessarily measures, but 


appears to measure. 


(vii) Factorial validity 

The validity of a test is the correlation. between that test and 
the factor common to a group of tests or other measures of 
behaviour. Such validity is based on a factor analysis. 


(viii) Intrinsic validity 
Validity arising from the fact that the items of a test inherently 
and necessarily call for the behaviour that the test as a whole is 


designed to measure. 


(ix) Sampling validity | Е 
Is a measure of content validity obtained by determining 


how far the test items are a representative sample of the 
universe of behaviour that defines the variables to the measured. 


(x) Status validity 
It refers to the correspondence between test scores and the 


Qualities of a Good Measuring Instrument 343 


concurrent status of tests. 


(xi) Predictive validity 
Validity is concerned with the relation of test scores to measures 
on a criterion based on performance at some later time. 


(xii) Validity by definition 
For some tests the objective is defined solely in terms of the 
population of questions from which the sample comprising the 
test is drawn e.g. when the ability to handle one hundred 
number facts of addition is tested by a sampling of those 
number facts. 

Although all these validities are not distinct from each 
other, still some differences appear to justify their classifica- 
tion into two major categories. 


3.3. Validity Classified 


Ebel classifies all types of validities into two major categories 
viz. Direct and Derived. 


(i) Direct Validity 

A test has direct, primary validity to the extent the tasks in- 
cluded in it represent faithfully, and in due proportion, the 
kinds of tasks that provide an operational definition of the 
achievement or the trait in question. 


(ii) Derived Validity 

A test has a derived, secondary validity to. the extent that the 
scores it yields correlate with criterion scores which possesses 
direct, primary validity. 

Thorndike and Hagen (15) have dichotomised all these types 
into two; those which depend primarily on rational analysis and 
professional judgement (Direct validity); and those which depend 
on empirical and statistical evidence (Derived validity). — - 


Direct Validities Derived Validities 
(a) Validity by definition (a) Empirical validity 
(b) Content validity (b) Concurrent validity 


(c) Curricular validity (c) Predictive validity 


344 Handbook of Pupil Evaluation 


(d) Intrinsic validity (d) Factorial validity 
(e) Face validity (e) Construct validity 


This division is not Sharp but convenient for classification. 

From the above definitions, we find a number of terms used 
as qualities of validity. The three main purposes for which 
psychological measurement is used are: 


(a) Representation of a specified universe of content. 


(b) Measure of psychological traits. 
(c) Establishment of a functional relationship with a particular 


variable. 


Accordingly we have three types of validity and can classify 
various terms under each type: 


Validity 
| | | 

Curricular Construct Predictive 
l. Intrinsic 1. Trait 1. Post diction 
2. Content 2. Factorial 2. Concurrent 
3. Rational 3. Convergent 3. Prediction 
4. Face 4. Nomological 4. Empirical 
5. Sample 5. Discriminant 5. Statistical 
6. Representativeness 6. Practical 
7. Logical 7. Classical 

8. Status 


It is again obvious that there are some common conceptual 
elements in these definitions but at the same time it is only 
possible to have a loose and general definition because of strik- 
ing differences in these definitions. What is actually ee | 
more concrete and realistic conception of the constituents tha 
make a test qualitative. rae 

Another onda is that although the test specialists mr 
validity as the most important quality of a test, yet we їп : 
lacking in most of our present tests according to the comments 
of the specialists. It can only be due to lack of understanding 


of the validity concept. "V 
Likewise, another difficulty arises due to questioning 


Validity of а measuring procedure rather than looking for an 


JA 


Qualities of a Good Measuring Instrument 345 


operational definition of quantitative concepts as in the case “of 
physical measurements: An atom clock for example is not justifi- 
ed on the basis of its superior validity but because of its closer 
approximation to the measurement of true time that reflects, in 

fact, superior reliability. Therefore, instead of asking for the 

validity of the basic method of measurement, which is meaning- 

less, we should be satisfied with operational definition of the 

thing to be measured. 


3.4. Can We Define Validity? 


Validity is regarded as the most important characteristic of a 
good measuring device. It refers to the degree to which a test 
measures what it purports to measure. This definition needs 
further elucidation: What does a test actually measure; to what 
extent, in what situations, under what conditions and to what do 
it do its job? In a Biology test, for example, a test is designed to 
measure an instructional outcome like the understanding of 
concepts. This is its validity but if it measures something else 
also, perhaps drawing skills, its validity to measure understand- 
ing of concepts is reduced. This test may be highly valid for 
10th class students but comparatively less valid for 11th class 
students and still less for college students. Thus a test may be 
a highly valid measure for predicting pupils’ academic success 
in class XI but less valid for predicting success for admission to 
a medical college and a still less valid measure of success for a 
medical on the job. Therefore, a test is valid for a certain 
purpose and may be wholly invalid for another purpose. 

~ A test is never completely valid or entirely invalid. We are 
mainly concerned with how valid is the test how well it serves 
the purpose. It may do so extremely well, reasonably well or ` 
hardly well. 


2.5. Implications of Validity 
Before taking up the problems and- difficulties in determining 


the validity of our tests, it may be useful to understand the 
implications of the three major types of validity. 


346 Handbook of Pupil Evaluation 


3.5.1. Curricular validity 

When a test is based on the content and instructional objectives 
ofthe course, its curricular validity depends on the degree to 
Which the test items represent these aspects. Therefore, if on 
analysis of the test content and the analysis of the course con- 
tent and instructional objectives, we find that the former repre- 
sent the latter, it is the curricular validity of the test. Because 
analysis is based on rational judgement this type of validity is 
also called rational or logical validity. It is also called content 
validity or sample validity because this analysis is mainly in 
terms of course content. However, it may not be construed that 
content is the only thing we are interested in, in devising an 
instrument. It is both the content (concepts, learning points) 
and the process (skills, abilities) that represent the curricular and 
hence needed for curricular validity. 

In recent years, there has been a growing interest to empha- 
sise in instruction, the ability to apply, interpret and evaluate 
thereby de-emphasising verbalism. As such, in test construction 
too, it is necessary that the instrument must attempt to measure 
these abilities rather than factual information in order to have 
curricular validity. But this is possible only when the objectives 
of instruction go beyond the memorising of facts. When this is 
done, both the curricular validity of test and instruction would 
improve. We may summarise the curricular validity by its 
following features: 


(a) Judgement is based on subjective standards and therefore, 
may be quoted as high, moderate or low. 

(b) How far an item is relevant to the trait being measured 
and the extent to which the whole set of items measure all 
aspects of the trait. The former represents the first appro- 

-ximation of its validity and if it bears a high correlation 
with the total score, the item is worth retaining. 

(c) Adequacy of the content and objectives has to. be ensured 
through proper representation of each by proper designing 


of the test on the basis of experts' judgment and users of 


the test. 


Qualities of a Good Measuring Instrument Е 347 


3.5.2. Empirical validity (concurrent and predictive; 
This type of validity is based on the concept that a good test 
must yield scores that agree with some outside criterion such as 


"teachers marks of future success on the job. 


The correlation between the test scores and the criterion 
scores determines the validity. When the criterion measure is 
taken at the same time or nearly at the same time it is termed 
concurrent validity. For example, when teachers’ rating or a 
standardised test is taken as а criterion against which pupils" 
Score on a teacher-made test is judged, it represents concurrent 
validity or status validity. Correlation between test scores and 
Some measure of performance or success obtained later is term- 
ed predictive validity. Here the criterion is future success on the 
job or performance measure at a later date. For example, when 
test scores in a high school biology: test are correlated with 
pupils’ performance later on at the medical college or test scores 
at the final M.B.B.S. examination in relation to success as a 
doctor, it gives the predictive validity of the test used. 

Concurrent and predictive validity are primarily empirical 
or statistical evaluation. For such evaluation it is essential that 
the criterion should represent the same quality, attribute or 
ability as the test, being validated, is supposed to measure. 
Most of the intelligence, aptitude and achievement tests are 
validated in this way as a valid measure of some ability, quality 
or trait. 


(a) Limiting factors 

No doubt we have performance records, teachers’ ratings, 
standardised tests and proficiency tests which may be used as 
criteria for validation; but how far are these criteria valid in 
themselves. If the test scores are reliable and teachers ratings 
are a reliable and valid measure of students’ performance, then 
the correlation between the test scores and the teachers’ ratings 
Will be an accurate measure of the validity of teachers' ratings. 
Therefore, in order to have maximum empirical validity of a 
test, the criterion (teachers' rating here) must be a reliable 
and valid measure besides ensuring the reliability of the test 
itself. 


b. 


348 Handbook of Pupil Evaluation 


(b) Criteria of a criterion 

АП criteria are partially valid because they measure only a 
part of the success on the job. А test on an Engineering course 
for selection of engineers gives only a particular criterion of 
success as an engineer. The ultimate criterion is one's success 
in life as an engineer. But this criterion of success in life, like 
other criteria, is inaccessible and as such we have to be satisfied 
with the substitute for it. One problem is to choose the most 
valid criterion from among the available ones. We may look 
for the following qualities in a criterion measure. 


(i) Relevance Р 

A criterion is relevant to the extent the score on the criterion 
measure is determined by the same factors that constitute suc- 
cess on the job. This is judged on rational and expert judgement. 


(ii) Freedom from bias 

A criterion should provide each person with the same opportu- 
nity to make a score. It means control of working conditions 
too: that criterion scores depend on the factors in the 
individual rather than factors in the conditions of work. 


(iii) Reliability 

This pertains to the stability or reproducibility of the measure 
of success. If the criterion performance itself is unstable then 
it cannot be predicted by any other test. 


(iv) Availability 
Depending upon the cost, the time factor апа convenience, the 
Criterion measure has to be selected for its practicability. 


(c) Interpretation and use of predictive validity 

The amount of validity is the degree of relationship between 
the predictor and criterion scores. The size of the correlation 
isa direct indication of the amount of validity. We cannot 
expect perfect or even high correlations in most cases. Only 
moderation correlation between the predictor(s) test(s) Сап 
Teasonably be expected. A validity coefficient, should be inter- 
Preted and used in applied settings in terms of the extent to 


qiue 


Qualities of a Good Measuring Instrument 349: 


which it indicates a possible improvement in the average 
quality of persons that would be obtained by employing the 
instrument in question. 

A test having predictive validity can be used in situations 
like the following: 


(i) Prediction of events occurring at a previous date on the 
basis of questionnaire administered to adults. 

(ii) A test designed to predict success in college means 
forecasting but a test used to predict brain damage is not 
meant to forecast who will suffer brain damage in future; 
rather who does and who does not have brain damage at 
the time of test administration. 1 

(iii) When a test is used to predict future performance, it is an 
aptitude or-prognostic test. 

(iv) Diagnostic tests indicate sources of maladjustment, pupils' 
difficulties in learning, showing thereby the existence of 
favourable state of affairs. s 


3.5.3 Construct validity 
A construct is an attribute or human characteristic on the basis. 
of which we assume that such an attribute accounts for some 
human behaviour. Constructs like dominance, insecurity, 
rigidity, inability to apply are used in psychology and help in 
identifying the psychological concepts they represent. Whenever 
an evaluation instrument is believed to reflect a particular 
construct, its construct validity must be found out. A construct 
must be well defined in order to draw an inference which can 
be verified. Construct validity is employed where curricular or 
empirical validity cannot be used or to supplement it. Accord- 
ing to Chronbach, "construct validity is involved whenever a. 
test is to be interpreted as a measure of some attribute which is. 
not operationally defined". The problem is, what constructs. 
account for Variance in test performance? Construct validation 
is essential whenever no criterion or universe of content is. 
accepted as entirely adequate to define the quality being 
measured. 

Constructs are essential for the development and testing of 
theories in Science. Science is mainly concerned with developing 


350 Handbook of Pupil Evaluation 


a measure of constructs and finding functional relationships 
between méasures of different constructs. Construct very to 
the extent the domain of related observable variables are big 
or small and lightly or loosely defined. The larger the domain, 
the greater is the difficulty of defining it for allocating it to a 
particular domain. Measurement and validation of a construct 
involves three processes: 


(a) to specify the domain of observables; 

(b) to determine the extent to which all or some of those 
observables correlate with one another or are affected by 
experimental treatments; and 

(c) to determine whether or not, some or all measures of such 
variables act as though they measure the same construct. 


Relation between the observables 
In an empirical investigation two approaches are followed to 
see how well the measures of observables go together. In the 
individual difference approach, scores are obtained оп some of 
the measures. Each measure is correlated with every other 
measure and analysis of correlation is done to find out the 
extent to which different measures measure the same thing. In 
the controlled experiment approach, we see how far the 
` treatment conditions have а similar effect on the different 
measures of observables. The degree to which two measures 
are affected similarly by different experimental treatments, they 
are assumed to be measuring the same thing as the construct. 
From this when we find high correlations among the mea- 
sures, it indicates that all measures measure the same thing. 
Measures forming clusters indicate that a number of different 
things are being measured and a zero correlation among mea- 
Sures indicates that different things are being measured by 
different measures. Оп the basis of evidence got from this, 
modifications can be made in the specification of the domain of 
observables of a construct besides adducing the relationship 
between the two constructs. 


3.6. Limitations in the Use of Validity 


Keeping in view the types of definitions of validity given РУ 


Qualities of a Good Measuring Instrument 351 


various authors, it is necessary to develop some operational 
definition of validity. A more concrete and realistic conception 
is needed, of the qualities that make a test good. Ebel has 
raised the issue whether we are justified to question the 
validity of the method of measurement. What is needed 
according to him is the operational definition of quantitative 
concepts and the accuracy of measurement. x 

Another difficulty that arises is how to know “what the 
testis supposed to measure." Criterion measures are seldom 
devised painstakingly when compared to the labour put into 
the construction of the test. : 

Moreover, what is the criterion of the criterion measure 
used to determine the validity of criterion measure? Unless a 
test is supposed to measure an ideal quantity which is 
measurable directly and therefore defined operationally, there 
would be difficulty in validating it etc. 

In predictive validity, we judge a test—say, T, as a measure 
of a trait Ion the basis of the extent to which it predicts trait 
C when C is a function of T and V or many more variables, 
rather than only a function of T. Can we justify the quality of 
a barometer merely on the basis of its accuracy in forecasting 
the weather? Likewise, can we put our faith in the validation. of. 
a test which contains items to test understanding and ability to 
apply against teacher's subjective rating in the form of 
grades given on the basis of pupils performance in biology, 
arithmetic, physical education, craft etc? The flaw is obvious. 
The fallacy arises from the assumption that the relation of a 
measure to one, single, other measure (criterion) is all impor- 
tant. Unless it is related to other measures it is functionless. 
What is needed is the meaningfulness of the test which depends 
on operational definitions, of measurement procedures, 
reliability, usability and convenience of the test in use. 


CHAPTER ХШ 


UNIT TESTING—A DEVELOPMENTAL 
APPROACH 


National Policy of Education (1986) envisages a continuous 
comprehensive evaluation for making evaluation an integral 
part of instruction to provide rcgular feedback to students 
about their adequacies and inadequacies in learning. This can 
be ensured by using various devices among which a unit test is 
the most potential information gathering devices which a class 
teacher can use to measure students’ learning and diagnose 
difficulties. But it requires technical competence on the part of 
teachers to develop such unit tests and use them effectively for 
various purposes. 

It is, therefore expected that after going through this 
chapter a teacher should be able to: 


(a) understand the concept of a learning unit and a unit test, 

(b) identify various types of content elements from a learning 
unit, 

(c) formulate specific objectives of the unit of 
learning, : | 

(d) develop-blue print of a unit test based on a learning unit, 

(c) select or construct the questions appropriate to blue-print, 

(f) undertake question-wise analysis to check and verify set 
of questions in terms of blue-print developed, 


teaching/ 


Unit Testing—A Developmental Approach 353 


(2) appreciate the need for different varieties of unit tests to be 
used as oral or written devices, and 
(h) use the different unit tests for various purposes. 


1. The Unit Concept 


A unit, sometimes regarded as an equivalent to a particular 
topic of the syllabus, or it represents a combination of two or 
more topics which are related. However, instead of defining a 
unit in terms of topics it may be better to understand it in terms 
of a related learning experience. In that case it may be an orga- 
nisation of related learning experiences which involve unitary 
organisation of experiences. Some people try to differentiate 
between the units of experience and the units of subject matter. 
It may be difficult to maintain this dichotomy when we regard 
subject matter as recorded experience and as a source of learn- 
ing experiences. Nevertheless, one must visualize some source of 
unity round which learning experiences may be organised. From 
this point of view, a unit is something more than a mere lesson 
or period of activity or an arbitrary division of experience. It is 
in fact an organisation of a number of experiences organised in 
relation to some of the many possible sources of unity. This 
source of unity may be in regard to the organisation in plann- 
ing the curriculum as well as to the classroom development. ` 
Depending upon pupils' natural and. social environments, a 
unit may be titled like, *The town water supply", “Тће school 
garden", “The local park", “The city zoo”, “A horticulture 
farm”, etc. Likewise, themes, problems, interests, processes etc., 
may form the basis of unit formation. Selection and planning 
of the units are in general determined by the design and 
general framework of the curriculum. The choice cf the unit 
is influenced by the pattern of a curriculum design. In a subject- 
centred curriculum, the logical unit is the basis for division of 
the subject involved, whereas it is derived in terms of the 
maturity of the learners if it is based on the areas of living. If 


designed on the basis of the experience of the learners, the unit 
may be organised around cluster 


354 Handbook of Pupil Evaluation 


subject matter which is teachable in approximately a week's 
time or a fortnight. (1) 


2. Concept of a Unit Test 


` A unit test isa written, oral or practical exercise based on the 
objectives and content of the unit taught and used as a formal 
or an informal device for measuring students’ achievement in 
order to improve it through feedback. Thus, a unit test is 


(a) written, oral or practical. 

(b) formal or informal device of measurement. 

(c) based on specific content of a unit. 

(d) limited to the testing of specific objectives of the unit. 

(e) administered after the completion of the unit. 

(f) meant for diagnosis of pupils’ achievement through feed- 
back of results. 


A unit test is a question paper in miniature. There is not 
much difference between the two except their scope and 
purpose. The main purpose of а question paper is that of the 
measurement of achievement and that of a unit test is improve- 
ment of pupils’ achievement. A unit test, unlike a question 
paper, may be mono-objective or multi-objective but а question 
paper is always multi-objective. The coverage of the content is 
intensive in a unit test while it is extensive and selective in the 
сазе of a question paper. A unit test may or may not include 
all the forms of questions and may have any number of 
questions depending upon the length of the unit of teaching. 
The time is also not fixed. It may vary from 10 minutes to an 
hour depending onthe length of the unit, in contrast to the 
usual time of two to three hours in a question paper. The 
Tesults of a question paper are used mainly for grading 
students. Tn case of a unit test, the results are used for diagnosis 
and improvement of achievement through regular feedback. 


3. Types of Unit Tests 


Unit tests ma 


А y be of different types. There is no rigid and fast 


em : = 
areation. Depending upon (ће purpose of the unit, its scope 


Mas does idm cc "7" “= УН 


Unit Testing —A Developmental Approach 355 


and the nature of the subject matter to be tested, unit tests may 
be prepared in various forms. A unit test may be of 10 minutes' 
duration or of an hour. However, it is presumed that generally a 
unit test would not exceed one teaching period. In case of 
bigger units, more than one unit test may be developed. We 
may have unit test of different varieties like the following: (2) 


(i) A unit test may contain only objective-type and short-ans- 
wer type questions where the subject-matter is such that it is 
not possible to cover and test adequately the content 
through one type of questions and long answer questions 
are not warranted. 

(ii) A unit test may contain only essay-type and short-answer 
type questions where it may be: difficult to have objective- 
type questions and at the same time is not possible to 
ignore the essay-type question to test certain abilities 
like summarisation, integration, expression etc. 

(iii) А unit test may contain only short-answer type questions 
which may range from 5 to 20 in number. Such a unit 
test may be used when the unit does not demand extended 
responses with respect to the various learning points and 
at the same time there is no facility for using objective- 
type questions. Ease of construction and lack of training 


of students in attempting objective-type questions could be 
other considerations. 


(iv) A unit test may contain only objective-type questions. 
Sucha test may be usable wbere cyclostyling facilities 
are available in the school and the unit to be tested lends 
itself better to objective scoring. It may also be profitably 
used where a unit is very long and multiple-choice type 
questions may serve better for adequate sampling. 

(v) A unit test may also be framed sometimes, say every 
quarterly, to test the various mental processes developed 
during teaching. In such a unit test, questions of any type 
may be included but all the specific intended outcomes 
implied in the objectives are tested. The results of such 
unit tests are very useful for improving instruction by 
diagnosing certain abilities which hitherto have not been 
developed. 


356 Handbook of Pupil Evaluation 


(vi) Sometime it is not possible, for want of time, to adminis- 

Я ler a separate unit test for cach unit. In such a case, two 
or three sinall units may be clubbed together for testing. 
This is desirable only when the units are closely related. 
Apart from the economy of time, it is possible to judge 
common features and distinguishing features involving 
related concepts in the units under testing. For 
example, units of teaching, say, photosynthesis and 
respiration can be covered by a single unit test, 

(vii) A unit test may also be based on a single diagram and 
questions testing different abilities may be put on that 
very diagram. Such a test may even be orally used with 
the class for review purposes. 

(viii) Sometimes a unit test may be based on a single table. For 
example in some unit tests on genetics, a table may be 
given with serial numbers in each compartment and the 
Students may be asked questions only in terms of serial 
numbers in each compartment given in that table. Such a 
unit test is very handy for use and can also be used orally 
to review a learning unit. 

(ix) A unit test may sometimes be prepared in such a way 
that a simple, two-dimensional chart may be given with 
Some entries in it. Students may be asked to fill in the 
Test of the entries. Such a unit test шау sometimes be 
used to test basic understanding of biological facts neces- 
Sary for learning a new lesson. Before taking up new 
lessons, previous knowledge of the students can be tested 
by giving such a chart to them for filling up. This chart 
will be quite useful for understanding the weaknesses of 
students with respect to the basic facts, which can 
become basis for remediation before going ahead with the 
next unit. 

(x) It may be necessary sometimes, especially in the case of a 


unit which by and large is difficult for an average student 
to follow, that a unit test be fr 


i23. 


ПЦ 


Unit Testing—A Developmental Approach 357 


difficulty to follow. The results of such a unit test may 
highlight or pinpoint the area of weaknesses and serve as 
a diagnostic test with limited use, with reference to a 
particular topic in the class concerned. Such a _test 
would only include questions on the basic understandings 
required to comprehend the major concepts involved in 
that unit. True and false statements, cyclostyled on a 
sheet of paper, can be profitably used in such cases. 


4. Blue-Print of a Unit Test 


4.1. Three Dimensional Grid 


Most trained teachers do know what a blue print is, especially 
those trained in paper-setting workshops conducted by the 
N.C.E.R.T. or by state educational | agencies. Nevertheless, 
only a few really understand the rationale behind the develop- 
ment of a blue-print and fewer still can apply their knowledge 
of blue-print to develop a good, workable blue-print. The two- 
dimensional grid or blue-print, representing the objectives and 
the sub-units is quite common but the 3-dimensional blue-print, 
which is an NCERT innovation, is a really useful device to 
ensure the needed validity and reliability of a unit test. A blue- 
print as envisaged in the N.C.E.R.T. literature 


(a) is a three-dimensional grid, 

(b) shows the nature and distribution of questions over different 
sub-units, 

(c) indicates the weightages given to various forms of questions, 

(d) pin-points the number of questions with marks, to be set 
testing different objectives in relation to different sub-units, 

(е) shows weightages given to various sub-units, 

(f) reflects the emphasis given to various objectives, and 

(g) depicts the structure and format of the intended test. 


4.2. .The Underlying Assumptions 


While preparing a blue-print, it is assumed that the teacher has 
already made all out efforts to develop various abilities (as 


358 Handbook of Pupil Evaluation 


implied in an objective) and accordingly the weightage to the 
Application objective, as compared to the Understanding and 
Knowledge objectives, shows that in teaching more emphasis 
was laid on the latter two. Another assumption maybe that the 
unit under testing is less amenable to the Application objective 
as compared to the other two objectives. Similarly, the assump- 
tion underlying differential weightage to various sub-unit is 
that one sub-unit is more important or has more potential than 
the others and encompasses more concepts, principles or other 
content elements than other sub-units. Likewise, the use of one 
form of questions in preference to another in a particular. 
sub-unit is based on the assumption that the nature and 
magnitude of the content in that sub-unit demands that parti- 
cular form to test effectively the desired objective or because of 
the facility of construction of that variety say, multiple-choice, 
from that content area. Similarly, the weightage given to 
different forms of questions assumes that the tester is trying to 
maintain a reasonable level of scoring objectivity. Overemphasis 
оп a particular variety of question or under emphasis of 
another variety in relation to another. sub-unit is based on the 
assumption that the content elements of that unit are more 
amenable or less manipulative to that variety to test the desired 
objective as indicated in the blue-print. 

Thus a blue-print, when prepared, does involve certain 
assumptions based on the limitations and uncertainties of the 
testing possibilities. Some of these assumptions can be question- 
ed by another teacher evaluator and, therefore, he can develop 
another blue-print on the same unit, which may be quite 
different in format and structure from the proposed unit test. 


Before starting a blue-print, it is thus essential that the teacher - 


becomes cognizant of the content elements of that unit to. judge 
the nature and scope of the unit. This ensures better distribution 
of questions of different types to test various objectives in 
relation to different sub-units delineated at the beginning. 


4.3. Methodology of Preparing a Blue-Print 


Looking at any of the ready-made blue- 


i eae prints may give as the 
idea that it is only filling- Y 8 


up of the boxes according to one's 


Unit Testing—-A Developmental Approach 359 


(tester's) will. In fact it is not so. Any blue-print which is 
prepared without cognising the intended anatomy of a unit 
test is sure to create difficulties while framing the questions. 
An application question inserted against a sub-unit might be 
difficult to frame because the content elements thercin do not 
lend themselves such a question. Likewise, a multiple-choice 
question shown against a particular sub-unit testing a specific 
objective may not be possible but a short-answer question may 
be easier to frame for the same purpose. Thus while making 
entries during blue-printing, care must be taken that one has a 
Clear picture of the problems and possibilities of framing a 
particular form of question in a sub-unit testing the intended 
instructional objective. - 

For getting an insight into the architecture of а blue-print, а 
lot of practice is needed. Though no specific rules can be given 
for the methodology of developing a blue-print yet on the basis 
of my experience. I venture to provide the following steps as 
guidelines for developing a blue-print, assuming that the 
teacher or evaluator is familiar with the content elements of the 
unit concerned and also the instructional objectives which are 
testable in a writien test i.e. Knowledge, Understanding, Appli- 
cation and Skill (only drawing) and the unit can be taught 
within a week or two. For illustration purpose, a sample blue- 
print on a unit, “а cell", for class IX based on a State syllabus 
is given in the form of an annexure for reference. 


Steps: 

(a) Delineate the unit into sub-units keeping in view an inte- 
grated chunk of content which can be taught almost 
independently as a module. Depending upon the nature 
and size of the unit, we can divide a unit into three to five 
or six sub-units. Less than three would mean one may not 
be able to ensure adequate coverage of various content 
elements as it quite likely that the tester may overlook 
significant content elements which is not possible when 4, 5 
or 6 sub-units are identified. On the other hand, if we have 
more than 5 or 6 sub-units it becomes quite cumbersome, 
like a full-fledged question paper. Moreover, this leads to 
low weightages to various sub-units which might prevent 


360 Handbook of Pupil Evaluation 


inclusion of essay-type questions (when content demands it) 
Which usually carry not less than 4 or 5 marks. Thus from 
the point of view of practicability and to ensure the needed 
coverage of content, 3 to 5 sub-units may be considered 
desirable. (5 in the sample) 

(b) Give proportionate weightage to various sub-units depend- 
ing upon the significant facts, concepts and textual details 
forming the magnitude or bulk of content in that sub-unit. 
(2, 5, 7, 6, 5 in the sample) 

(c) Formulate or identify the objectives (K-U-AP-SK) which 
can be tested using the content of the unit. All the four 
objectives may not be necessary for every unit. Some may 
demand two, others three and some only one. (K-U-AP and 
SK in the sample) 

(d) Assign proportionate weightages to each of the objectives 
identified under step-3. (8, 7, 5, 5 in the sample) 

Weightage to various objectives depend on: 


ti) amenability of content to testing of different objectives. 
(ii) the use to which the test results will be put. If a reme- 
dial programme is to be organised then more weightage 
to the knowledge objective has to be given or it may be 
an exclusively knowledge-based unit test. 
(iii) previous performance of students on these objectives. 
(iv) intended learning outcomes to be used as criteria or 
standard of performance. 


(e) Decide the approximate time to be alloted for the unit 
test. It can vary from 10 mts.to 1 hour depending upon 
the size of the unit, However, one teaching period may 
usually be considered appropriate for a unit involving 5 to 
10 teaching periods. (40 mts. in the sample) . 

(f) Decide the total marks to be alloted to the unit test. This 
would depend on the number and form of questions to be 
included. However, in reality we have to fix-up first the 
time limit arbitrarily, say 30-40 mts. and then decide the 
number of questions (25 in the sample). : 

(2) Decide the number and form of questions (о be included in 
the test (Е, S.A., V.S.A., M.C. in the sample) and give 


ij 


Unit Testing—A Developmental Approach 361 


proportionate weightage, depending upon the nature of the 
content and familiarity of students with one or the other 


form. 


A unit test may have one or more type of questions 


with some variation in the weightages given to each type. 


(5, 10, 


5, 5 in the sample) 


{h) Now start filling in the boxes of the grid as under: 
(Refer to the sample in the annexure): 


@) 


(ii) 


(ii) 


Insert the total marks in the bottom right-hand corner 
box as indicated by (а) in the blue-print. Enter sub- 
unit-wise marks and objective-wise in columns (а?) 
(а!) in rows. 

Then decide about the essay-type questions which 
could be given only where weightage to a sub-unit is 
sufficient enough and the content demands a long- 
answer question. Moreover certain essay-type ques- 
tions do include a part requiring students to draw a 
diagram as a part of the same question. In such cases, 
set the marks apart for the skill part of the essay ques- 
tion under the skill column in the same row against the 
same sub-uait as shown in sub-unit-3 (3 marks for 
skill). This helps the teacher to apportion the remain- 
ing weightage of the skill objective to other question/S 
to be set, testing skill. The figures inside the bracket 

indicates the number of questions to be set and the 

figures outside indicates the total marks for the 

question/s. In the sample unit against sub-unit-3, 

under essay-type, 2 marks--3 marks for K and SK 

are shown. Since the skill part is included in the 

same question under essay type, the number of 
question/s is not shown e.g. 3(—), in column ‘E’ 

under skill objective, so that only one question is 

represented (indicated by (b) in the blue-print). 

The next step is to fill-in the skill column so that 
whichever sub-unit is more amenable to testing of 
the drawing skill, could be used for alloting skill- 
based questions. Sometimes it is necessary to frame 
a full-fledged question testing skill independently. 
This is possible only if we first explore the possi- 
bility of identifying the sub-unit which admits the 


362 Handbook of Pupil Evaluation 


skill-based question as indicated against sub-unit-2 
under S.A. the column of skill-objective. (indicated by 
fe | 
(iv) Once skill questions are inserted, procede to 
; allocate Application questions. The simple reason 
for this is the inherent difficulty of framing such 
questions оп every sub-unit. Therefore, we should 
identify the relevant sub-unit, having more potentia- 
lity than others for framing questions testing the 
application objective. This prevents the difficulty 
that may arise later in converting Knowledge or | 
Understanding question to the Application type 
question which is not always possible. Likewise 
content elements in that particular sub-unit/s may 
be more amenable to multiple-choice than the short 
answer variety of question for testing the Application 
objective. Sometimes all the sub-units could be used 
for testing the Application objective although they 
may not admit all forms of question. (indicated by 
‘d’ against sub-unit 2 under Application Objective). 
(v) Once the distribution of Skill and Application 
questions is over, the task becomes easier 45 conver- 
sion of Knowledge-based questions to the Under- 
standing type or vice-versa is no problem. Neverthe- 
less, it is suggested that we may now procede sub- d 
unit-wise making entries simultaneously under 
Knowledge and Understanding objectives. Care has 
to be taken that the decision about allocating 
objective-type questions should precede: short- 
answer questions. Likewise distribution of questions 
under the objective ‘Understanding’ should precede 4 
the Knowledge objective so that choice of content Y 
element could be made for higher level objectives 
in precedence to lower level objectives. Thus, entries 
in all the sub-units are made sequentially as Shown 
in the sample test. (marked as e, f, g, h, iand j) 
Thus (a) to (j) indicates sequential steps for develo- 
ping a blue-print. 
Distribution of questions over К and U against each 


Unit Testing—A Developmental Approach 363 


(vi) 


(vii) 


(viii) 


sub-unit is given below: 


Sub-unit-1 U—V.S.A. 1(1) K—V.S.A. 1(1) 


Sub-unit-2 U—M.C. 1(1) K— Nil 
Sub-unit-3 v= Nil K—V.S.A. 107 
Sub-unit-4 U—S.A. 20) K—S.A. 2(1) 
M.C. 11) j 
Sub-unit-5 U—S.A. 20) К—У.5.А. КІ) 
M.C. 117 


More often than not, some adjustment about the 
distribution of questions under the Knowledge and 
Understanding objective, becomes necessary in one 
sub-unit or two to ensure the correct weightage to 
sub-units and objectives. This is inevitable unless 
one deliberately goes ahead keeping in view only the: 
adjustment of marks and not the content elements 
of the sub-units. 

Insert row-wise and column-wise sub-totals indicat- 
ing the number of questions and the marks for 
those questions. Check the totals on both the sides 
to see that itis the same (25 in the Sample). The 
blue-print is now ready. 

Check whether in the blue-print, the questions are 
well spread over the various sub-units and are not 
crowded at one place. See that by and large diffe- 
rent units are not restricted to one type of questions 
only although in a particular sub-unit it might be 
inevitable. 


4.4. Using a Blue-Print 


Once the blue-print is ready, framing of questions on cach sub- 
unit becomes possible. It provides the basis for framing of 
different types of questions, testing different abilities, using a 
specific content. Thus the construction of each item is focussed: 
on a pre-determined objective, thereby ensuring that all ques- 
tions are objective-based and cover the content adequately. 


More than one blue-print can also be prepared on the same: 


364 Handbook of Pupil Evaluation 


unit. Two unit tests prepared on the same unit of teaching 
When administered to the same class can give an idea about 
the validity of students’ performance. To what extent perfor- 
mance of students on the two units go together would indicate 
the degree of equivalence of the two tests. If two or three 
questions are framed on each of the questions indicated in the 

_ Same blue-print, one can develop parallel unit tests. If certain 
absentees or remedial-group students are to be retested on that 
particular unit, the parallel unit test prepared on the same blue 
Print can be profitably used. The results may be used for 
determining the reliability of the unit test when two versions of 
the same unit test (based on the same blue print) i.e. parallel 
unit tests, are administered at the same time. 

A study of different blue-prints prepared by a teacher from 
time to time on various units of the syllabus would throw light 
on the instructional Strategies of the teacher regarding 
emphasis be laid on the various objectives and the type of ques- 
tions he used. Do his/her blue-prints reflect almost the same 
weightage to-instructional objectives on all units?Do higher level 
objectives receive more attention towards the later part of the 
session? Does he/she use only a particular form of question or 
practises all forms of questions? Thus the study of blue-prints 
used by a teacher over a period of time gives an insight into the 
emphasis laid by the teacher on various instructional objectives. 
If blue-prints prepared by different teachers for a particular unit 
test or on various units are compared, interesting inferences 
can be drawn on the basis of which guidelines can be provided 


to the teachers for developing blue prints envisaging сотрага- 


bility of students' performance. 

The blue-print of a unit test provides a scientific basis for 
the construction of a unit test. It takes care of adequate 
Coverage of content and objectives to ensure the needed validity. 
Questions developed on such blue-prints by different teachers or 
by the same teacher overa long period can ultimately lead to 
the development of a good question bank comprising objective- 
based questions on different units of the syllabus. When this 
Stage comes after a good deal of practice of framing questions 
9n pre-designed blue prints, the necessity of developing blue 
Prints for every unit may not be felt. The ultimate aim of 


Unit Testing—A Developmental Approach 365 


developing the blue print is to take cognizance of the validity 
and reliability of the unit test and not merely filling in of the- 
boxes with any type of questions under any objective and sub- 
unit. 


5. Framing of Questions 


Once the blue print of the Unit Test is ready the next step is to 
construct questions of various types, testing different objectives- 
as indicated in the blue print against sub-units or topics. The 
quality of these question is to be maintained as discussed in the 
previous chapter. 


6. Editing and Consolidation 


Each question needs to be edited before it is finalised. This: 
editing includes ensuring relevance of the intended answer with 

the model answer ргерагес at the time of construction of the 

question. This also includes verifying the key (answers) as well as. 
the format of the question besides checking the relevance ог 
each question testing the intended objectives. This is followed 
by consolidation of questions in a pre-determined order which 
may be in terms of content, objectives, form of questions ог 
difficulty level of questions. 


7. Instructions for the Examinees 


When the unit test is ready general instructions are developed: 
which provide the needed information and guidelines for the- 
examinees to take the test wisely. 

A sample unit test along with a blue print is given in the 
Annexure-A. 


8. Using the Unit Tests 


Since evaluation is an integral part of teaching and learning; 
students are observed in various situations continuously where 
measurement of their achievement may be possible. For this. 
purpose, various tools and techniques are used. The unit test is 


one of such tools. 


366 Handbook of Pupil Evaluation 


In a yearly plan of instruction, a teacher has to divide the 
prescribed syllabus into various learning units. A learning unit 
‘comprises such topics which have some sort of continuity and 
coherence. While deciding about a teaching unit, the teacher 
has to take into consideration the achievement of the class, the 
time factor and the emphasis given to the topics and the 


‘weightage given to it in the syllabus. He has to determine the- 


. instructional objectives of the unit also. When the instructional 
programme of this particular unit is over, the teacher has to 
appraise the students’ achievements in terms of the instructional 
objectives specified at the beginning. For this purpose he has to 
develop a formal or informal test which may be called a unit 
test for that particular unit. Such unit tests when properly 
constructed and used, serve a number of useful purposes, some 
of which are given below: (4) 


(i) For measuring achievement 

‘Such tests can be used as classroom tests for testing the achieve- 
ment of students. These may be administered after completion 
of every unit of teaching. This enables the teacher to know the 
rate of' progress of students as well as the extent of attainment 


of the objective of the unit. 


(ii) For diagnosis of students' weaknesses 

The analysis of the test result's may provide evidence of poor 
achievement in certain areas by an individual or a group of 
students. The teacher can identify weak students and areas of 
their weaknesses. This helps him to adopt remedial measures 
for the individual or the group to facilitate further learning. If 
need be, he can frame a diagnostic test on the hard spots of 
learning to find out the exact nature of difficulties. (5) 


(iii) For accelerating achievement — | 
On the basis of the test results, teaching сап also locate certain 


. areas of achievement where better achievement by the students 
is possible. The teacher can then concentrate his efforts on an 
individual or a group where achievement of excellence in that 
area would be possible. In fact, exceptionally high achievement 
of certain students gives a clue to the need for an enrichment 


— Rm 


Unit Testing—A Developmental Approach 367 
programme. 


(iv) For improving instruction and efficiency 

The tests not only help a teacher in locating the strong and weak 
points of the students but also help in discovering the strong 
and weak points of his methodology, his understanding of a 
certain concept. A defective method adopted by teacher becomes 
recognizable by the negative response from the students. In 
such cases, the teacher can modify his method of teaching or 
strategy of unit planning. There can be other situations where 
even backward students are able to understand and grasp some 
difficult points because of the use of more appropriate methods. 
Here the teacher can develop confidence in his methods. 


(у) For motivating the learners 

If unit tests are given at regular intervals, they act as good 
incentives for students to evaluate their progress regularly and 
also to improve their performance further. They also motivate 
students to develop regular study habits. A well-planned use of 
unt tests at regular intervals discourages selective study on the 
part of students and provides hints for remedial measures. 


(мї) For instructional purpose as teaching aids 

Various questions of the unit-tests can be used as good teaching 
devices. Here short-answer type questions are of special 
significance for use on various occasions, before, during or 
after instruction. Essay and objective type questions, especially 
the multiple-choice type questions can also be used profitably 
if these questions are thought-provoking. Especially, short- 
answer type questions lend themselves better for use as teaching 
devices. These can be used in pre-testing, development of a 
lesson, review of a lesson and for diagnosing, in a rough and 
ready manner, the difficulties of pupils. (6) 


(vii) For pupil's assignments 

Instead of using a unit test as an instrument of evaluation, it 
can be used in full or in part as home assignments. It is 
convenient for the teacher to pick up some questions from the 
unit tests available and use it as a school or home assignment. 


368 Handbook of Pupil Evaluation 


As unit test questions are well chosen, in accordance with the 
requirements of the unit, the teacher need not frame fresh 
questions everytime. He can make use of selected questions. 
Some of the questions can be even utilised for undertaking 
small projects. 


(viii) For students’ self-evaluation 

As every unit test carries with it a key for objective-type ques- 
tions and outline answers, for each short answer and essay- 
type question it can be used by the student for self-evaluation. 
But this is possible only when the school maintains a library of 
unit tests and a question bank which is available to the 
students. When available, students can pick up questions from 
the unit test, write answers and compare them with the given 
key or outline answers. This helps them to know their strengths 
and weaknesses at any time. 


(ix) For developing a question bank 

When a unit test on each unit of the syllabus is prepared by the 
teacher, he will have, in course of time, a good number of unit 
tests on all the units of teaching. This will lead to the devclop- 
ment of good pool of unit tests. This pool will continue to be 
argumented year by year and up dated from time to time. Each 
question from. these unit tests can be transferred. Such a pool 
can be taken advantage of by all subject teachers for use in 
their evaluation programme on specially prepared cards of 
about 5'x8' size. Such cards have question on one side and 
key or outline answer on the reverse side. By and by a good 
pool of questions on these cards is developed. Such pool of 
unit tests and individual questions can be developed for 
different subjects for different classes. There can be more than 
one unit test developed on the same unit. Keeping in view the 
limitation of time and needed technical competence for construc- 
tion of good questions, need for supply of ready made questions 
to the teachers in the form of item bank or unit tests library 
cannot be over-empliasised. 


(x) For reinforcement and feedback 
Continuous use of unit tests by teachers and reporting of results 


Unit Testing—A Developmental Approach у 369 


to students. is essential Results of unit tests should ђе“ 
communicated as early as possible. This serves two purposes, one of 
reinforcement of learning, and the other of feedback for teachers 
and students both. Continual feedback of unit tests helps in 
acquainting students about the adequacies and inadequacies in their 
learning. This provides motivation on the one hand and develops 
positive self-concept on the other by recognising their achievement. 


1 8с : ову “XE · 


5150194 


. 


ЯЧПХЯММУ 


5 G9) ` (р) i Q Qo 
Š (05. "— = — О = — ӘӘ (02. — Ore Miss = чола 
3 под 75 
Ñ G9) (p) (ч) (Ч) (Ч) зони 
= — — DL — = qu = Dc — - = 04 — UOISIAL 
5 (p)9 (1) (DI a) (0 Ба 
à под '* 
~ (9) (чу =(P) „ 0 odooso1o1A 
~ (Eua — · · (Әз Of. — in =з — (Dr — (Dz  'urepun 
S 4 Sd] [9UCZIO, 
© . s 
à 5 IPD ЈЕ 
5 [83] (о) . (p) Q) 5122 Jo 
(6)5 (Dc Е, T (De —i- ONE = = – – — — amos 7 
G9) © @) 
Cyr ТРА — a = (I — = боәці пә? 1 
"m ом 
juojuo) 8 
í р -—suonsonb  — 
19101, VS а о У5Л. VS Я O УЅА WS 34 O WSA YS а ошод 
тоу, AS попеопасу Зшрпеззәригү ə3pəamouy — злцоогао 
juuic-onig 
уур OF : WHEL XU: SSe Teo inun 


ABojorg : yefqns 


371 


Unit Testing—A Developmeatal Approach 


sysew cz = 1810] 
FESIDUN 
ЕИ 
01 SHEW- 
с: 


suonsanb 91 = [о]. 
5 ом 
$'ON 
Сом 
TON 


(О) əanoafqo 
(YSA) Jamsuy оце ADA 
(VS) 195suy oys 


(a) Хед 
Krunung 


"peurquioo uəəq олец иопѕәпЫ aures әц1 jo sued om) Jo syrew 160) ѕәјоцә Т, 
“влеш 3jeorpur s2x2€1q 241 opisino зотау рит suorsonb Jo Joquinu oy} ојезгри! 5јомоела UIYIIM SINJI :59/0AJ 


————————MM———ÓÓÀÁÁAMM _.————„———=— мч... Ny c 
G9) GU 


(9) (5) Gv) 
Sc 5 $, 
(90) $ (Dc: Oe (Qc ML Mt — (oc 


L 1 8 
(DT Е. (qr OF (ре «pue 


oL 
пој -апб 


Handbook of Pupil Evaluation 


372 


"SEN Т a a oo ———— I O O = "AUN 
oc | a „ Мог 8 | 1IL 
n mm ao o 
Doe | — — |ож — s — Cx — (9 — jt — — (5 | sjeio]-qns 

— Ни ПН 2 -——— ——— 
ie" — JPG (nz (nz (ђ = 9 Se | Атаап5 poo ‘p 
(98 | — — | (moe — CY EDO =. p = = = | Sqo^ poo 'Е 
(06 | = == = = "SEU — |- —. = "ux | 14018 џопејпдоа 'z 
(Oe к = (x => "(Ота — NE — (Di = Sea uonnqtisip uonvindoq 1 

! | 
E sin 
3 y 19105 
YS I| О VSA VS a|. О VSA VS a| О YSA .vs а | suogsond 

у jo uno. Ч 

IOL 118 иоцеоаау Зшроезѕзәригу 93po[mous ' 2analqo 
INDBIdUnTggH 
‘уш 0р : oum O€ : SHIVA] шпшхејај 
X : 8581) Җ 4\ddng poog. pue пер : jun 
359, WN durs 


373 


Unit Testing—A Developmental Approach 


*uonoos auo A[UO : 5104295 JO eureqos 
*popr^o1d st uondo ON : suondo Jo 2ШӘЦЭ5 


og 1801 8I тој 

ri F [0] әлпоә40 

В o [YSA] 10msuy 11048 &19A 

91 8 [ys] Jomsuy 11045 

М 1 [a] 4553 
рәцоце sau] : suopsanb Јо ‘ony K1vsurumg 


saau2afqo juasaffip ому uisa? uojisanb гш05 IYI uy pautquioo UIIG JADY SYADUL 1991 $2/0U2(] y 


"saut әшәри хәзәрләр PISMO soma рир SUONSƏND fo daquinu ay) әйәэә1ри г12201д anys sonig  зајом 


$ 


` 


MAN AND FOOD SUPPLY 


Class :X Time : 40 mts. 


M. Marks : 30 


Instructions 


1. 


TI. 
TH. 


IV. 


E 


ә 


All questions are compulsory. 

Answers are to be written in the answer book. 

Questions 1 to 9 have 4 suggested answers ^, B, C, and D 
only one of which is correct. The letter indicating the 
correct answer should be written in the answer book. Each 
carries one mark. 


Questions 10 to 17 carry 2 marks each and answers should 
not exceed 10-15 words. 


TEST 


- Which of the following trends of population growth rate is 


seen in Western Europe? 

A. Births equal deaths 

B. Births lower than deaths 
„С. Births higher than deaths 
D. None of the above 


Which of the following is. the major function of a food 
chain? 


A. Food supply to other organisms 

B. Constant supply of energy 

C. Change in types of foods 

D. Energy transfer from lower to higher organisms 


Urban Population in developing countries increases be- 
cause of 


. A. higher birth rate in cities. 


B. rural to urban migration. 


C. greater opportunity of urban employment. 
D. lower death rate in cities.  - 


The relationship between 


c И green plants and producers is 
similar to the relation betwe 


еп bacteria and 


Unit Testing—A Developmental Approach 375 


A. producers. 
B. consumers. 
C. decomposers. 
D. nutrients. 


. Productivity of forest land is bigher than cultivated land 


because 

A. cultivated lands have loose soil. 
B. forests replenish nutrients. 

C. cultivated lands are tilled. 

D. forests retain moisture. 


. To check the decrease in the density of population where 


mostly adivasis are concentrated which of the following 
important measures would you take? 

A. Bring more land under cultivation by clearing forests. 
B. Build huts and roads for adivasis. 

C. Open dispensaries and schools. 

D. Increase land under settled agriculture. 


. Statements 


1. As technology improves man’s ability to adapt his 
environment increases. 

II. The percentage share of employment in resource in- 
dustries like grazing, mining, farming are shrinking in 
the U.S.A. y 

Direction 

Indicate which of the following BEST describes the 

relationship between the two statements given above? 


A. The situation in statement II contradicts the principle 
in statement I. 


. B. The situation in statement II neither contradicts nor 


can be explained by the principle in statement I. 

C. The situation in statement II can be explained by the 
principle in statement I. 

D. The situation in statement II is consistent with state- 
ment I. 


1 Productivity is lower in the grasslands than in thé forests 


because of 


37 


6 Handbook of Pupil Evaluation 


A. less producers than consumers. 

B. less producers than decomposers. 
C. more producers than consumers. 
D. more producers than decomposers. 


9. Which of the given changes in food habits will favourably 


affect capacity.of land supporting a number of people? 


A. Less consumption of plant and animal foods. 
B Equal consumption of plant and animal foods. 
C. More consumption of plant foods. 

D. More consumption of animal products. 


10. *Future pattern of high density of population areas will be 


largely dispersed all over the world’: Why? 

· Population growth rate has been 0.1% per annum with the 
Practice of settled agriculture. Why had this growth rate 
increased only after the onset of settled agriculture and not 
prior to it? 

- Why is grazing on open pastures less productive for the 
rearing of cattle for beef? What is the better method which 
is practised now? 

‚ Explain why commercial fishing has become the main 
economic pursuit of N.W. Europe, Japan and China over 
beef cattle rearing. 


· Distribution of population in terms of number per unit 


land is not the only aspect,of population studies. 'Suggest 
two other aspects which may also explain the real pheno- 
menon of population distribution. 
The world is finite, resources scarce, oil is going, ores 
depleted. Man.is far too enterprising. Fire will rage with 
man to fan it. Soon we'll have a plundered planet. People 
breed like fertile rabbits, people have disgusting habits. 
Moral" 
The evolutionary plan 
went astray when evolving man. 

— Kenneth Boulding 


Suggest two measures ‘to this conservationist’s complaint t to 


preserve our planet from being depleted. 


Unit Testing—A Developmental Approach 3n 
16. 


IV 
TERTIARY 
ш 
CARNIVOROUS 
и 
HERBIVOROUS 


PRIMARY PRODUCERS 


In the pyramid of numbers given above which level will have to 
be reduced in order to ensure an ecological balance of 
producers? 

17. Read the statements given below and answer the question. 
1 Rate of increase of 3 per cent is necessary to maintain 

an Mequate level for the entire world population in 
2000 AD. a 

Il. Per capita income improvement is minimal in the last 
decade, therefore, deficiency gap continues to widen as 

one enters the stage of maximum population growth. 
What is the main difference in the approaches of these two 
hypotheses which explain the effect of population growth on 

' economic growth? 

18. What are the main trends of population growth rates in 
1. Australia, 2. Asia, 3. Europe? How does this affect the 
age structure of population in the economically productive 
age group in these countries? 


Handbook of Pupil Evaluation 


378 


Ld 
siotunsuoo Алецлој, (9) 


t-Ixt SNOIOAIUILD (e) '91 
(әѕәці Jo ома Au y) —иеш о} Suled э!шопозә JO јиошололашт (2) 
Z=1XZ р одејзем әопрәз оу ABOJOUYDa} JUDIOWA (9) 
= : $921nos91 [Е02]09 jo 25 (v) "с̧І 
2 dn-yva1q сүешәў-әүерү (9) 
т=1х@ е `5и018ә1 


әшеѕ оцу ш dn-xe91q uonejndod [euni pue ULgIN чо uoneuroju] (v) ‘pI 


“sBale Sulysy [ejseoo 8001 Jo 4)11дејіелу (9) 


Z=1XTZ puer Jo Аилцәпроза мој pue pue[ ојделе Jo JLT (в) сє 
's|eturuv oui pooj оз sdoJo әцу M013 Оў 91329 s! 31 (9) 
т=1х1 "SjUannnu jo Ш1п]әЈ Ou 51 3134} Se 51105 Jo . 


KyAnonpoud əy) 959212U! 10и op sainised uado uo o[meo јод эчү (e) 71 


те=1хї "i4o18 uonejndod ш э$гәззи! рие Ápuonbosuoo $ц1лд элор (9) 
‘2181 yeap jo SuuawoT (e) “TT 
попе пззо рив $әэїпо$әл мәи JO AilTIqe[reA V. (9) 


т=1хї у "воле мәи Jo dn 8шоәйо әцу ЧИМ (e) '01 
- sy1eui astat-ed $19А\$ию ошто рејзодха ом Ke] 
rc Á———B—M 
d y’ v a qd; P о о я аш fox 
6 8 L 9 NC у D с I ON CO 


>= ee а 


әшәцәс Surg1e[A рие hay Surroog 


379 


Unit Testing—A Developmental Approach 


ZI ЈО ino шой 
1221100 (| Aue 


= HX) 


S | = (x (9) 


= px (9) 


r 


(ux) 
(1х) 
(х) 
* (x) 
(шл) 
(пал) 


по28 оде Зшҳіом aanonposd at ut 
pue Зипоќ әле suosiad әлорү— 


dnosd ade Аилпәпрозй əy? ur 520 ОМ May 
чим uonejndod Зипод— 

3Anonpoud Аүүеәїшоиозә pue 
Зипоќ әле oym suosiad ^a4— 


2240f YOM 


(1л) 
(л) 
(^1) 
(m) 
(u) 
@) 


ayes yeap ЧИН 
әуел ла мот : odomg 


321 yeap мот 

оде ціла ЧН : тү 

911 фор мот 

эзел щупа мот : enensny 
2]D.! 1/1404 7) 


"(8028 Joy роо 


3uaraujns Surí|ddns uo q144018 о!шопозә jo edw əy} səsájeuy 


g иәшә}р]1$ 


"i018 uonejndod o10ui Aue 8ицәәш jo o[o1 21915504 ay} SIJUI 


p 1u2u9]DIS* 


(9) 


(9) 


(e) 


`81 


"LE 


5 У [4 (4 vs K[ddng poo 51598815 uonvonddy 
= 9 (4 ГА vs A\ddng роот soynuopr Surpuejsiopu(y 
3 9 [4 g vs &144п$ рооч soynuepy Витрџејолори у, 
3 g t [4 vs &jddus роод ѕәлейшоо зшроеуѕзәрис 
© a [^ [4 vs 4\ddng poo зәцшәрү Surpueysiopun, 
5 digsuone[o1 
E У i 1 I о |. Ајаап5 роод зец 54153 uonvorddy 
~ dpgsuonejo1 
© у 1 1 о Sq2A\ poo4 seusi[quisa uoneondd y 
i uoftinquisip 
S v I Џ о чопејпдоа 5лоји] uoneorddy 
x uonnqiysip 
v I I о чопејпаод 5лоји] чопезцаа у 
Я m I I [e] 5q3M роо. soynuapr Surpuejsiopun 
a I I о 5Чәд\ роо. so1eduio 8шриеузләригү 
К) 1 T о &[Чап$ poo; зрео овројмоц 
5 I 1 O  7- SQM poog sient aSpojouy 
uonnqujsip 
9 I Џ о uone[ndoq зезон овројмопм 
TPAD] 8шпәл\вир 
Aynoyyip Joy own pano]? — ,suonsonb 
poreumsq pojeumsq syle jo ody, иип зиәзиогу uonvoyrosds 9$An»sfqo 


380 


SISA|BUY ospa-uonsongy 


т 


ом 


381 


Unit Testing—A Developmental Approach 


4524 1010 
обелолу 103 Я 


SONNY! 30] Vow 


"syw OP 
о SI 
У [4 


У 


одеш oc 1101. 
$ 
ЫД г. 


N 


essa d 

2 io^suy 3100$ Wg 
е ломи 31045 Мод VSA 

e^noqo Ox 
а Á[ddng poo 5222 ^ ә8рәрмоч у 
vs Ајда 5 poog | веврпр uongorddy 

diusuonr[o1 

vs A\ddng poo К ЫЕ! uoneoiddy 
vs Ajddng рода 5598815 ооцеоціау 


"81 
yt 


791 
E! 


CHAPTER XIV 


TECHNOLOGY OF SETTING BETTER 
QUESTION PAPERS 


1. Introduction ` 


One of the most commonly used techniques for measuring 
cognitive learning is the written examination. Written tests 
are, therefore, thë main instruments of evaluation used in 
public examinations as well as classroom testing. This is true for 
almost all the countries of the world. Written tests may take the 
form of periodical tests, term tests and annual tests. Each one 
of these tests in heavily loaded with verbal content and can be 
scored quantitatively on a 0 to 100 points scale. Compared to 
otal and practical tests, written tests are much easier to cons- 
truct, administer and score. Moreover, the tester can ensure a 
good deal of objectivity in scoring which makes them a more 
valid and reliable measure of students’ learning. This is preci- 
sely the main reason why written examinations have come to 
stay and have attracted more attention of the educators and 
researchers than have oral or practical examinations. In this 
chapter attempt is made to enable the reader to 


(a) recall the various steps in setting better question papers. 

(b) design the question papers to ensure maximum validity. 

(c) develop blue print of a question paper on the basis of a 
given design. 

(d) séfect or construct the appropriate forms of questions in 
conformity with the blue print. — ' 

(e) develop scoring key and marking schemes for each question 
of a question paper for improving scoring objectivity. 

(f) edit the question paper through question-wise analysis. 

(g) judge the given question paper and suggest improvement. 


Technology of Setting Better Question Papers 383 
2. Need for Cognizance of Objectives 


Inspite of the fact that written tests have been in vogue for centuries, 
it was only during the fifties of this century that the need for 
objective-based testing was realised. The main impetus to recognise 
instructional objectives as the basis for teaching and testing was 
given by Bloom's taxonomy (1) of Educational objectives: Cogni- 
tive domain (1956). Thereafter a lot of work on objectives has been 
done. In India too, this philosophy of behaviourism caught the 
fancy of the researchers and curriculum workers. In fact, a lot of 
work on objective-based testing has been done by the National 
Council of Educational Research and Training, New Delhi, since 
the early sixties. 

In order to improve the validity of written tests, the need for 
formulating instructional objectivities in advance cannot be over- 
emphasised. Objectives in the cognitive domain are generally stated - 
in terms of Knowledge, Comprehension, Application, Analysis, 
Synthesis and Evaluation áccording to Bloom's taxonomy. How- 
ever, in India the NCERT has developed its own taxonomy of 
instructional objectives after working with thousands of teachers 
and paper setters who were involved in evaluation courses orga- 
nised for the training of paper-setters, Keeping in view the 
researches on Bloom's taxonomy and the difficulty of an average 
teacher in comprehending and using this taxonomy, the NCERT 
has worked out a more realistic and workable taxonomy (2) which is 
now in use in almost all the states of India. This is a three-tier 
taxonomy comprising Knowledge, Understanding and Application 
objectives arranged in order of hierarchy. Each of these objectives is 
further specified in terms of expected learning outcomes. The three 
objectives which form the basis of testing cognitive learning are : 
depicted inthe following sketch which has become almost a cliche in 


the field of written examinations. From the diagram it is 


384 Handbook of Pupil Evaluation 


evident that three major objectives of biology teaching have been 
cognised, viz. knowledge of biological facts and phenomena, under- 
standing of biological concepts and principles and their application 
to new or unfamiliar situations. All the three are depicted in order of 
complexity. The knowledge objective is defined into two, under- 
standing into eight and application into seven specifications or 
expected learning outcomes. It is assumed that the knowledge 
objective is subsumed under objective understanding which in turn 
is subsumed under the application objective. Thus a question on 
understanding automatically tests knowledge as well, while an 
application question tests understanding vis-a-vis knowledge also. 
All these objectives of biology are stated in behavioural terms as 
given in the Appendix-A. 

The validity of a test is further maintained by basing questions on 
these specifications which are adequately covered during testing. 
Moreover, proportional weightage is given to the three objectives 
depending upon the previous learning of students and considering 
the difficulty level at which a test is to be developed. Thus the role of 
objectives is very significant in the assessment of cognitive objec- 
tives of learning in biology. 


3. Designing a Question Paper 


Though planning a question paper is considered a routine 
activity for teachers, yet analysis of teacher-made tests 
reveals that the setting of question papers is far from satisfac- 
tory especially when considered in terms of coverage of instruc- 
tional objectives, sampling of content, wording of questions and 
scoring objectivity. This is true of both external as well as internal 
examinations in vogue. Nevertheless the need for well-designed 
question papers is now increasingly felt. In fact all the Boards of 
Secondary Education/School Educations in India have become 
cognizant of the need for training of paper setters for making 
external examinations a more valid and reliable measure of stu- 
dent's achievement: Such programmes are regularly organised by 
the National Council of Educational Research and Training in 
collaboration with the various examining boards in India. These 
training courses are conducted to develop expertise in developing 
Bood question papers and unit tests. Theemphasis in such courses is 
9n objective based testing. The usual pattern of designing question 


Technology of Setting Better Question Papers 385 


papers vis-a-vis unit tests for written examinations is reflected in 
Boards, question papers (3). 


Basic Requirements 
The basic requirement of any given instrument of evaluation is 
to ensure its validity, reliability, objectivity and practicability to 
the maximum extent. Improvements relate to the administrative 
as well as academic features of paper setting. As far as admi- 
nistrative aspects are concerned, the decision is to be taken 
about the total time for the question paper, the total marks, the 
‚ number of questions, the scheme of sections, scheme of options 
etc. As far as academic steps are concerned, these relate to the 
coverage of objectives, coverage of content, form of questions, 
quality of questions, the arrangement of questions, marking . 
scheme, model answers etc. Each of these aspects is explained 
here-in-after. 


(i) Total time 

At the secondary or senior secondary stage, the time allotted for 

à question paper va‘ies from about 2 hours to 3 hours while in 
the vd of А Unit test, it may vary from 15 minutes to 45 
minutes depending upon the length of t \ Y 
period. But a decision about М total е nis E 
for designing a question paper, She Hest requisite 


(i) Total time ~ 


of including eassy. 
increases, 
(iii) Sections 


‘hether i 
a question paper shouta have sae өк mere мосу 


386 Handbook of Pupil Evaluation 


also depends on the type of questions to be included. If multi- 
ple-choice questions are introduced, a separate time limit may 
be necessary. But if it is a unit test with very few multiple-choice 
questions, there is no use having a separate time-limit for these 
questions. Moreover, from the administrative point of view, it be- 
comes rather difficult to ensure effective separate administration 
of two types of tests. Sections are sometimes made content-wise, 
in which case ft is advantageous to put a restriction on attempting 
а fixed number of questions from each section in order to dis- 
courage selective study. Likewise, from the question can also be 
the basis for various. sections and it does facilitate Scoring 
besides analysis of results for feedback. j Р 


(iv) Scheme of options (t 
In most examinations free over-all option is discouraged because 
of the non-comparability of results. However, internal options 
of the either-or type are acceptable with a proviso that optional 
questions should be balanced in terms of the objective tested, 
the content area, the length of expected response and the 
difficulty level of the question. 

The following aspects of design relate more to tlie academic 
side of the question paper. i 


(v) Coverage of Instructional Objectives 

Knowledge, understanding, application and skill (drawing skill 
Only) objectives are usually tested in the written examinations. 
These objectives are given proportionate weightage for the 
purpose of paper setting. The weightaghes may vary from one 
ехапипр board to the other depending upon the variation, the 
-Previous background of students on these objectives, emphasis 
on the intended learning outcomes and the difficulty feve] of the 
question paper as a whole. This weightage, when applied to 
unit testing, woüld Vary from unit to unit depending upon the 
nature, scope and content of the unit. One of the designs of the 


question papers reveals the following weightages given to 
different objectives in annual examinations. 


Technology of Setting Better Question Papers 387 | 


Weightages to Ob;ectives 


Weightage OBJECTIVES 


Knowledge Understanding Application Skill Total 
Objective Objective Objective Objective 


Marks 2 15 10 5 50 
Percentage | 40 30. 20 10 100 


(vi) Coverage of content!syllabus 

To ensure adequate coverage of content and better sampling of 
- questions on each area, the whole syllabus is divided into con- 
venient units numbering about 8 to 12. Each of these units is 
given proportionate weightage depending upon the nature, scope 
and significance of, the content-elements. This weightage is 
determined in terms of its length, place in the textbook, impor- 
tance of the topic etc. The number of units should neither be too 
small nor too many. А small number may lead to the neglect of 
certain topics while too many units may hinder the inclusion of 
long-answer questions that carry more marks. This weightage 
is represented in a table like the following: 


Weightage to Content Units 


Weightage 


UNITS OF THE SYLLABUS 


Unit Unit Unit Unit Uvit Unit Unit Unit Unit Unit Total 
1 2 3 4 5 6 7 8 9 10 


Marks 


5 8 4 3 8 1 50 
Percentage 


18 10 16 8 6 16 2 100 


(vii) Decision about the Form of Questions 

The form of questions normally depends on the acquaintance of 

Students with: the form of questions, the extent of reliability 

acceptable, the needed objectivity in scoring, the nature of the 
_ Content and total time stipulated for a test. 

Essay-type questions are more 
answer and objectiv 
However, in India, 
number of short- 


frequently used than short 
€ type questions in most examinations. 
there is now a clear trend of increasing the 
answer and Objective-type questions with a 


388 Handbook of Pupil Evaluation 


Corresponding reduction in essay-type questions. The weightage 


to different forms of questions as revealed in one of the 
question papers is as follows: 


Weightage to Form of Qutstions 


Weightage Types of questions 


Essay type Short answer type 


Marks 
Percentage 


(viii) Difficulty Leve] 


In designing the question paper, the expected difficult y-level is 
also to be stated. 


determined only 


Weightage of Difficulty-Level of Questions 


Weightage 


Estimated Level __ | 
Difficult Average Easy Total 


Marks 10 30 10 50 
Percentage | 20 20 100 


For classroom testing, the difficulty-level would depend on 
the objectives to be tested, the Previous staridard of the class, 
thé nature of content and the homogeneity or heterogeneity of 
the group. | 

_ All the tables given above аге only suggestive and hypotheti- 


cal and are to be prepared keeping in view the requirements of 
a particular question Paper. 


АП the above-mentioned features become a part of the 
design of the question: Paper which indeed reflects the policy 


Objective type Total 


Technology of Setting Better Question Papers 389 
about paper setting. For external examination, this has to be 
quite rigid, although broad-based, and should normally conti- 
nue for three years or so as long as there is no change in the 
syllabus. The paper-setter has no right to make any change in 
the design of the question paper but, based on this design, he is 
free to develop his own blue-print about which a discussion 
follows. 


4. Preparation of the Blue Print 


Having received the design of the examining agency, a paper- 
setter is expected to prepare a blue-print of the intended ques- 
tion paper. This blue-print is a three-dimentional chart, indicat- 
ing the distribution of various forms of questions in relation to 
various unit of the syllabus, testing different objectives. This 
blue-print also depicts the numerical weightage to each question 
as well as the scheme of options, if any, to be used in the setting 
of the question paper. This blue-print is the monopoly of the 
paper-setter. Howeyer, this blue-print must be developed within 
the constraints of the given designs which reflects, in fact, the 
policy statement of the question paper. Based on one design, 
many blue-prints сап be prepared and the same blue-print can 
be used for developing more that one question paper which may 
be used for supplementary examination. Technique of preparing 
a blue print is discussed in details in Chapter XIII on unit 
testing. 


5. Framing of Questions Based on Blue-Print 


Framing of questions may be taken upon each sub-unit or one 
particular form of question, say, the objective-type may be 
taken up first, followed by short-answer and essay-type ques- 
tions. Each question located within a blue-print must reflect the 
Same objective, content and the form of the question. Of Course, 
mastery of the subject matter knowledge of Objectives and 
their specification and the skill in framing different forms of 
questions are essential. Whatever the їуре of questions framed 
it must be based on a well-defined, pre-determined specific 
objective relating to a specific content-element and has to be set 


390 Handbook of Pupil Evaluation 


at the desigued estimated level of difficulty. Each question must 
satisfy the conditions of its structure. Thus, every question is 
to be framed only in terms of its position indicated in the blue 
print. 

In hardly needs mention that model answers, alongwith the 
marking-scheme must be developed at the same time along with 
the framing of a question. 


6. Editing and Consolidation 


Once the questions are ready, the next step is the consolida- 
tion of those questions. Questions may be assembled into 
various sections which may be done on the basis of-their for- 
mat. Free-response questions may be at one place while the 
objective-type at another place. Within each format of question, 
the same order may be followed, preferably in te;ms of content- 
units and the question may be arranged in increasing order of 
hierarchy of instructional objectives. Having arranged this, the 
editing of the paper may be done, which may involve checking 
each question with respect to the objectives tested, the correct- 
ness of the key, the homogeniety of the distractors, the appro- 
priateness of the stem, the preciseness of the language used and 
the technical format of each question. There are quite a few 
clues which creep in and care may be taken during editing that 
these clues are removed. An illustration of improving a multi- 
ple-choice item is given in the annexure-A. While editing instruc- 
tions to the examinees, they may also be looked into carefully. 
Where-ever two sections, one for the objective-type questions 
and the other for the free-response are given the specified time 


limit may also be checked, in terms ef the number of questions 


and their difficulty-level. 


7. Preparation of Scoring Key and Marking Seheme 


The scoring key and the marking-scheme must be developed 
simultaneously with the framing of questions. In objective-type 
questions, the key must be checked and indicated while in case 
of short-answer questions, the complete answer indicating 
clearly the credit points, should be made. In case of essay-type 


Technology of Setting Better Question Papers 391 


questions, the expected-outline answers, alongwith marks for 
each value points, should be indicated. Each of these value 
points must be given weightage proportionately. Where more 
than one answer is possible and the students have to choose a 
limited number, the marking scheme should indicate that any 
of the given points, as expected answer, will be taken as correct. 
Efforts may be made to minimise the freedom to the examiner 
to mark in his own way. The best marking scheme is one which 


when used by many examiners, tends to homogenise the mark- 
ing pattern. 


8. Preparation of the Question-wise Analysis 


Thelaststep is the analysis of each question. This can be 
done by means of proforms, information about each ques- 
tion is inserted under various columns which usually consists of 
objective specification, forms of questions, unit/sub-unit, marks 
allotted, estimated time requirement, and estimated difficulty 
level. From these details, when summarised in a tabulated form 
(4) (see Annexure-B) one can get an idea of the distribution 
of questions with different objectives in relation to 
various units. This helps to validate it against the blue-print 
developed for the question paper. The analysis. can also be 
undertaken by indicating the serial number of a question at the 
appropriate place in the blue-print to see if it satisfies the 
dimension ofthe blue-print. The question-wise analysis. also ` 
helps the paper-setter to judge whether the question as a whole 
would be difficult, easy or is likely to cater to all the three types 
of students average, poor and bright. Even at this stage, certain 
questions may be made difficult or easy besides, of course, recti- 
lying the incongruency or inadequacy which might be observed 
during analysis. 


9. Predominance of Objectives in Paper Setting 
9.1. Performance Standard and Objectives 


Once instructional objectives are formulated for a unit of teach- 
ing and instruction is done towards the attainment of those 
objectives, pupils’ performance has to be judged in terms of 


392 Handbook of Pupil Evaluation 


Standards laid in the form of predetermined specific objectives 
or intended learning outcomes. Students’ score on different 
objectives indicate their level of performance. Same score on 
two different objectives reflect different standards. A 
student’s score of 20 on Knowledge objective and 20 
marks of another student on Application objective is 
nO comparison tò make. Performance standard of 
the latter is likely to be higher than the former. Such ап intan- 

' gible difference which is significant from the point of view of 
learning, is sub-merged in the ocean of totality of marks. How- 
ever, for evaluation in school, this anomaly can be obviated by 
comparing student's performance in terms.of the three aforesaid 
objectives. In. view of the increasing. importance being given 
to internal assessment in the new pattern of schooling and the. 
apprehension of lowering of standards, a question paper provid- 
ing for questions set on all the three levels, can be profitably 
used to make inter institution and even inter state comparison 
of students. 


9.2. Why not only Application Questions 


While setting a question paper, it is desirable to give the 
weightage to knowledge Understanding and Application 
objectives. This Weightage can be worked out in the light of 
Previous standards of student’s performance in school or in. the 
Board's examination. An analysis of previous three year's 
results, preferably objective-wise which could be possibly on a 
Sample basis can form the basis for allocating weightages to 
different objectives. Gradual decrease in weightage to know- 
ledge objective and the corresponding increase in the other two 
Objectives. Say every fourth year will surely lead to improve- 
ment in the standards. 

If the three objectives are in hierarchical order as claimed 
earlier, will it then not suffice to have only application questions 
Which automatically test understanding and knowledge? This 
' Should work alright when setting a question Paper is meant for 
Anter-class, inter-school or inter-state comparison. But when 
Used for formative evaluation in day-to-day classroom testing, 
this would not work because the Purpose is not merely the 


Technology of Setting Better Question Papers 393 


measurement of student's achievement but also improvement 
of their achievement. For this, the tool must provide evidence 
which helps to discriminate the students at various ability levels. 
That is way the questions testing all the three objectives need 
be included. 


9.3. Content versus Objective 


Content is the medium for appraising the ‘students’ ability 
implied by each objective. Should content determine objectives 
or objective should determine the content? In fact, both are 
complementary. It is the analysis of content that reveals the 
possibility of an objective which could be achieved through that 
content. Every concept may not be amenable to testing at all 
levels of objectives. On the other hand, if objectives are identi- 
fied first then a specific content or a concept can be selected to 
test. that objective. In unit testing the former approach may be 
more suitable while in a full fledged question paper based on 
the whole syllabus the latter approach is desirable. So comple- 
mentarity of objective and content need be appreciated. 


9.4. Question Form and Objective 


Instructional objective if clearly understood and predetermined 

it facilitates the selection of the form of questions to be included 

in the question paper. In fact the from of question is condition- 

ed by the objective to be tested. For example, if we are inter- 

ested in testing of abilities like that of investigation, summarisa- 

tion, expression, reorganisation or evaluation the form of 

question that suits is essay type only. Like-wise if knowledge of 
fundamentals of a subject is the aim as in the case of a master 

test, the objective type questions of multiple choice variety may 
suit more from the point of view of wider coverage. If one is 
interested in reviewing lesson in terms of objectives in а limited 
time then short answer type questions suit more. In class room 
testing it Is the best form for oral testing. Cognizance of 
objectives is, therefore, essential in deciding about the form of 
questions to be included in a question paper. 


394 . Handbook of Pupil Evaluation 
9.5. Options and Objective 


Place of options in paper setting.is quite controversial. Overall 
option in an external examination is discouraged to make it а 
more reliable tool of measurement. Internal option of ‘this’ 
type has come to stay and their use in internal and external 
examinations cannot be ruled out. Nevertheless, it is not the 
option that is important but what the optional questions 
measure? Unless the optional questions are comparable in 
terms of what they test for, their validity may be jeopardised. 
A question testing ability to apply if given against an optional 
question testing more recall or even understanding, cannot be 
justified. Optional questions should test the same objective 
apart from using the same form of questions requiring approxi- 
mately the same length of expected response and content area. 
When all these requirements are met is makes the optional 
question balanced. Commonality of objective is the first 
criterion when one or more options to the same questions are 
provided. Ignoring this aspect means sacrificing a lot of validity 
of the question paper at the alter of options. 


9.6. Objectives and Sections|Parts 


In some question papers there are two or three parts or sections 

and students are expected to choose а given number of ques- 
tions e.g. in case of Social Studies where History, Civics, 

Economics are the major areas of testing. A similar case is 
Sometimes in Biology where Botany and Zoology portions are 
separated in different sections and students are asked to attempt 
at least, say two questions from each part out of a total of e 
In such cases, there is again every possibility that the pd 
question within a section may prompt the students to 50 0С 

simple knowledge based questions in preference to unc 
ing or application based questions which are more difficult to 
attempt. Here-again we sacrifice a lot of validity unless ue" 
has been taken to keep all questions comparable in terms о 

Objectives and content areas. ` 


derstand- 


С о» 


~ E 
D 


7“ а 


Technology of Setting Better Question Papers 395 
9.7. Intended Learning Outcomes and Editing 


Analysis of a number of question papers of various Boards of 
Secondary Education by this writer reveals that even when 
questions are based on different objectives and stipulated 
weightages to various objectives is maintained the true nature of 
an objective is not reflected. For example in testing for under- 
standing it is seldom that questions testing student's ability to 
interpret, extrapolate or infer are included. Most of the ques- 
tions are based on classification or citing illustrations. Similarly 
it is not uncommon to found application questions limited to 
testing their ability to reason and predict. Rarely a question 
you would find on testing students ability to formulate hypo- 
thesis, suggest procedures to test a given hypothesis or to 
evaluate a given situation. How does this effect the quality of 
the question paper? Unless there is adequate sampling of the 
abilities (intended learning outcomes) implied by an objective 
purpose of evaluation is defeated. As shown in the diagram 
depicting relationship of the three objectives, it is the specified 
objective i.e. the intellectual abilities that indicate the scope of 
each objective. The more adequately the specifications аге 
covered under each objective the more valid the question paper 
would be. 

Since teaching is an integrated act one is neither expected to 
teach or each of the specifications one by one, nor it is desirable ` 
to do so. Nevertheless, in cvaluation vis-a-vis paper setting, 
deliberate attempt must be made to ensure adequate coverage of 
these specifications. Some of these can be tested in isolation 
while others car be integrated in case of questions. In nut shell 
what is needed is the adequate coverage of the abilities listed 
under each objective. 


9.8. Objectives and Question-w ise Analysis 


Analysis follows the collection of data after administration of 
the test. If the test results are analysed in terms of objective; as 
it could be in case of board examinations, or even in terms of 
specified objectives, as it could be possible in case of classroom 
testing, it has a great diagnostic value. It provides data about 


396 Handbook of Pupil Evaluation 


Student's strengths and weaknesses in terms of the developed 
and the developing abilities. In addition it gives indirectly an 
indication of instructional impact on the development of those 
abilities. Besides content wise analysis, the objective of students' 
achievement is necessary to appraise students’ growth in terms 
of intended learning outcomes, so that instruction if needed be 
improved. 

The same principle applies to reporting and feed back 
mechanism. If interpretation is done in terms of objectives and 
evidence is feed back by the Boards to the institutions it can 
provide the teachers, administrators and in service agencies with 
useful data to readjust instructional objectives, adapt instruc- 
‘tions to the modified goals and affect needed improvement in 
the measuring devices of evaluation. Such an approach brings 
in realism in the teaching learning process. Оп the one hand 
instructional objectives become the basis of testing, and on the 
other hand they- act as criterion behaviour against which both 
‘teaching and testing are validated. 


10. To Sum Up 


From the above discussion on the technology of paper-sett- 
ing, it is quite obvious that the farming of a good question 
Paper is not an over-night job but requires а good deal of insight 
into the basic frame of reference which provides guidelines on 
the one hand and restrictions on the other, regarding the fram- 
ing of questions of a particular variety with reference to a 
particular unit of teaching. If a paper setter is cognizant of the 
validity and reliability of his question paper, he should follow all 
these steps listed above. If something goes wrong with the paper- 
setting, nothing can be set right later regarding judgements 
made by the evaluators about students’ learning. 


| ANNEXURE A 
| | IMPROVING А MULTIPLE CHOICE ITEM 


1. Our body is made up of various types of cells, tissues, 

organs and the systems. There is some relationship between 

a tissue and the organ. Same relationship does not exist 
\ except between one of the following. 
T А. Fats and carbohydrates 

A B. Species and genra 
| C. Birds and fishes 
| D. Plants aad animals 
| *E. An organ and system in the body 
2. Our body is made up of various types of cells, tissues, 
p organs and the systems. There is some relationship between 
| a tissue and the organ. Same relationship does not exist 
between one of the following. 

| A. Fats and carbohydrates (Double negative in stem is 
| B. Species and genra removed) 

C. Birds and fishes 

D. Plants and animals 


| *Е. An organ and system in the body 
i 3. Our body is made up of various types of cells, tissues, 
P organs and systems. There is some relationship between a 


tissue and the organ. Same relationship exists between one 
2 of the following. 


| А. Fats and carbohydrates (Negative ‘does not’ is 
k B. Species and genra removed in the stem) 
1 C. Birds and fishes 
Р D. Plants and animals . 
*E. An organ and system іп the body 
4. There is some relationship between а tissue and the organ. 
| Ё Same relationship exists between the following: 
[ А. Fats and carbohydrates (Instructional side 


k B. Species and genra 
C. Birds and fishes 

D. Plants and animals 
*E. An organ and system in the body 


- There is some relationship between a tissue and the organ, the 
same relationship exists: 


A. Between fats and carbohydrates 

B. Between species and genra 

C. Between birds and fishes 

D. Between plants and animals 

E. Between an organ and system in the body 
* is the key expected 
(Task is set in the stem) 


is removed in the stem) 


398 Handbook of Pupil Evaluation 


6. Relationship between a tissue and the organ is the same as 
A. Between fats and carbohydrates 
B. Between species and genra 
C. Between birds and fishes 
D. Between plants and animals 


(Language of stem 
is made precise) 


*E. Between an organ and system in the body 
7: Relationship between a tissue and the organ is similar to 
the one 
A. Between fats and carbohydrates (Language made more 
B. Between species and genra appropriate in the stem) 
C. Between birds and fishes 
D. Between plants and animals 
*E. Between an organ and system in the body 
8. Relationship between a tissue and the organ is most similar 
` to the one 
A. Between fats and carbohydrates (Language made more 
B. Between species and genra specific by adding 


C. Between birds and fishes 
D. Between plants and animals 
*E. Between an organ and System in the body 
9. Relationship between a tissue and the organ is most similar 
to the ore between H 
A. Fats and carbohydrates (Stem is made more 
B. Species and genra inclusive to avoid 
C. Birds and fishes repitition of ‘between’) 
D. Plants and animals 
*E. An organ and system in the body | 
Relationship between a tissue and the organ is most similar 
to the one between 
A. Fat and a carbohydrate (Number clue of ‘key’ 
B. Species and a genus is removed) 
C. Bird and a fish 
D. Plant and an animal 
*E. An órgan and system in the body 
Relationship between a cell and the tissue is most similar 
to the one between j : 
A. Fat and carbohydrate (Verbal clue in ы 
B Species and genus ‘organ’ is removed) 
C. Bird and fish 
D. Plant and animal к 
*Е. An organ and system in the body єк? 
12. Relationship between a cell and the tissue is most similar 


to the one between 


qualifer in the stem) 


10. 


Ш. 


>} 


| 


Technology of Setting Better Question Papers 


15. 


16. 


17. 


399 


A. Fat and a carbohydrate (Article clue of 


B. Species and a genus 'an' is removed from key) 
C. Bird and a fish 


D. Plant and an animal 
*E. Organ and the system in the body 


. Relationship between a cell and the tissue is. most similar 


to the one between 


A. Fat and a carbohydrate (Length clue from the 
B. Species and a genus 7 key ʻE’ is removed) 
C. Bird and a fish 

D. Plant and an animal 

*E. Organ and the system 

Relationship between a cell and the tissue is most similar 
to the one between 


A. Fat and a carbohydrate (Indistinct Distractor—B 
B. Host and a parasite is replaced) 

C. Bird and a fish 

D. Plant and an animal 
*E. Organ and the system 

Relationship between a cell and the tissue is most similar to 
the one between 


A. Fat and a carbohydrate (Distractor—D is made 
B. Host and a parasite more homogeneous) 
C. Bird and a fish 


D. Liver and the gall bladder 
*E. Organ and the system 
Relationship between a cell and the tissue is most similàr 
to the one between 
A. Fat and a carbohydrate (Inconsistency in 
B. Host and a parasite Distractor—C is removed) 
C. Bird and a reptile 
D. Liver and the gall bladder 
*E. Organ and the system 
Relationship between a cell and the tissue is most similar 
to the one between 
A. Fat and a carbohydrate 
B. Host and a parasite 
C. Ectoderm and an endoderm 
D. Liver and the gall bladder 
*E. Organ and the system 


(Distractor—C is made 
more plausible in content) 


* is the key expected 


19. Relationship between a cell and 


21. 


22. 


23. 


400 


18. Relationship between a cell and the tissue 


20. Rel 


A. Ileum and the intestine 


Handbook of Pupil Evaluation; 


is most similar 
to the one between 


A. Fat and a carbohydrate (Distractor—B is made 


B. Ptyalin and the starch more plausible in content) 
C. Ectoderm and an endoderm 


D. Liver and the gall bladder 
*E. Organ and the system 


the tissue is most similar 
to the one between 


`A. Ileum and the intestine (Distractor—A is made more 
B. Ptyalin and the starch plausible in content) 
C. Ectoderm and the endoderm 
D. Liver and the gall bladder 
*E. Organ and the system 
ationship between a cell and the tissue is most similar 
to the one between 
A. Ileum and intestine 
B. Ptyalin and starch 
C. Ectoderm and-endoderm 
D. Liver and gall bladder 
*E. Organ and system 
Relationship between a cell and th 
to the one between 
A. Ileum and intestinc (Key is made more 
B. Ptyalin and starch homogenous) 
C. Ectoderm and endoderm 
D. Liver and gall bladder 
*E. Neurone and the nerve 
Relationship between a cell and the tissue is most similar 
to the one between 


(Language of options 
is made more concise) 


с tissue is most similar 


(Language of option 
B. Ptyalin and the starch is made grammatically 
С. Ectoderm and the endoderm more appropriate) 

D. Liver and the gall bladder 


E. Neurone and the nerve 


Relationship between a cell and the tissue is most similar 


to the one between z Most 
A. Ileum and the intestine 
B. Ptyalin and the starch 
C. Ectoderm and the endoderm 
D. Liver and the gall bladder 


(Format of item is improved) 


- Neurone and the nerve 


is the key expected 


ANNEXURE-B 


SAMPLE QUESTION PAPER IN BIOLOGY 


Design 
QUESTION PAPER/UNIT TEST 


Subject : Biology 
Unit/Paper : I 


Class : IX 
Time : 2} hrs. 
Marks : 50 
1. Weightage to Objectives: 
Objectives K U A S 
Percentage of marks 40 32 16 12 
Marks 20 16 8 6 


2. Weightage to Form of Questions: 
Forms of Questions E SA  VSA O 


No. of Questions 


2 
Percentage of marks 20 50 15 15 
Marks Allotted 10 25 7 8 
Estimated Time . $50 mts. 60 14 16 


3. Weightage to Content: 


— 


оро моч ром - 


Units/Sub-Units 


. Living Organisms 

. The Cell 

‚ Cell Wall, Cell Organelles and Cell inclusions 
| 


Cell Division 


- Plant and Animal tissues 
. Classification 
‚ Bacteria and viruses 
. Flowerless plants 

. Flowering plants 

. Animal Life 


^ 


Total : 


с\ 06 OV бо со \л > Cn М += 


сл 
© 


402 3 Handbook of Pupil Evaluation 


4. Scheme of Sections : Section-A consisting VSA and Objec- 
tive type questions of 30 mts. duration 
to be administered separately. 


5. Scheme of Options : No option is provided. 


6. Difficulty Level : Difficult : 20% marks. 
Average : 60% marks. 
Easy : 20% marks. 


Abbreviations: K (Knowledge), U- (Understanding), 
S (Skill), E (Essay type), SA (Short Answer 
type), VSA (Very Short Answer type), 
O (Objective type). 


-0 
Q tiek DRAN NL A вазал |, 
| (DES эт. (у ЛИЕШ == = се — ЖКЕШӘОЕН | 5 
8 @)8 = —|(у d | |. 
t ар = — | (1) = ЖИЕ ЕЗ = = (у uonogisse[) | '9 
= p 4 = sonssn [purum | . 
s` 0% -|[- Ф о‘ ов = |ә @ — — Ра уза узе 
Р 6» теу им, г OPIEESAKDE.— (бе — чола тоо | $ 
ы ^ suohnpup |o) |" 
СУ] š pur sayjaursig | : 
& 5 Феб. —|— | quee син = = 19D “ем 1199 | 
ae Se i 2 
5 s|-—. eae ов ч = — IPO eure 
Е pO} — —|— — рор — We рр cor — — изнео ин б 
= ————— l— —— 4 ————— =. у 
Q (9) (9) (Р) (£ @ (0 | 
5 M suon ! 
= f -sand suun ; 
5 -— 10 -ап5 
a | VS | О YSA vS 4d] о VSA VS a|'o vs. YS g | Wor so sun, | 
x "ec" 8 NNNM 14 
= 18101, ТОД хок 
3 oL "s ШЕЕ uongoiddy Зитрџву5лори (у одројмошм saanoalqg | "IS 
|-- — — 
2 1 $ 
E siy И ош 05 syle шпшхву nodeg 
© 
Е X sep è " | K8o[otg : yalqns 
Іміча зала 


| 
| 


= Handbook of Pupil Evaluation 


404 


^ 
n 
м E 80 оре 9I ‘ON (О) гапоогао 
LO ме TI ‘ON (YSA) 12su y 11005 Алод 
t виоц2об JO 9ueq»s ST SJEN - 4 TI ом (VS) Jomsuy 1045 
беен биопаО Jo auos: OI ови ` с ом (3) 40559 : Алгшшпс 


"ucpisonb ouo uoy ој решашоз uooq әлец sx1cur Jey? sajouaq, Д 
"улеш әрвоүрш $1244 24 әрвїпо ѕәзтйу ри? SHOHSOND Jo 1oquinu оцу ојвогрш заузела vpr sonar] : злом 


(QJ! GM (09 


= (рга = 


(Di 
Ш. — 
(DRESD: (ус 


EE (1) = 


APPENDIX 


SAMPLE QUESTION PAPER BIOLOGY 


Marks : 50 
Time : 2} hrs. 
Instructions 

1. All questions are compulsory and carry half mark each. 

2. In Section-A, Question Nos. 1-14 require very short answer 
which may be one word, one phrase, or a sentence. 
Answers to these questions are to be written on the space 
provided in the question paper itself. 

3. In questions 15 to 30 only one answer out of the 4 given is 
correct. Serial number of the correct answer is to be 
encircled. 

4. Section-A will be collected back by the invigilator after 


30 minutes. 


- In Section-B, marks are given against such question from 


0. No. 31 to 44. 

Questions 31 to 42 require short answer involving 2 to 4 
credit points and can be answered in about 20 to 40 words. 
Only precise answers are expected. 

Question Nos. 43 and 44 require long answers which should 
not exceed 400 words or so. 


Section—A 


Time : 30 mts. 


- What term is used to the characteristic of living organisms 


to respond to alteration in environment? 


- What term is used for the folds of the inner membrane of 


mitochondria projecting towards its interior? 
If a child is gradually loosing weight which of the two 
processes Anabolic or Catabolic may be faster? 


- Which cell organelle is called the Suicidal bag of the cell? 
- Which biologist postulated for the first time that every cell 


arises from the pre-existing cell? 


406 Ј Handbook of Pupil Evaluation 


6. Give an example of a cell which changes its shape very 

frequently. 

7. Corresponding genes have the same set. of trait, What is 

such a pair called? 

8. Which fundamental tissue in plants helps to prevent tearing 

of plant body when strong wind is blowing? 

9. Which of the two phases, the sporophytic or gametophytic 
is dominant in thallphytes? 

10. In which of tbe two bacterial cells, having the same number 
of mesosomes but different surface area— volume ratio rate 
of metabolism will be more? 

11. Observe the diagram given opposite and identify the 
organism. 


—> Сарѕотеге 
Nucleic acid 


12. Which organelle in Amoeba serves the same function as 
the kidney cell in an animal? 
13. Small terminal air sacs of the lungs concerned with gaseous 
exchange are called .................. 
14. What is the basic unit of a nervous tissue? 
15. Which of the following organisms can be placed both at 
cellular and individual level of biological organism? 
A. Liver 
B. Amoeba 
C. Rhizopus 
D. Hydra 
16. Cell theory does NOT state that all cells 
A. perform metabolic activity. 
B. arise from pre-existing cells. 
C. contain hereditary material. 
D. originate spontaneously. 
17. A pair of two longest cells are: 
A. Amoeba and nerve cell. 
B. Ostrich egg and human cell. 


Technology of Setting Better Question Papers 407 


18. 


C. Nerve cell and cotton fibre cell. 

D. Muscle cell and kidney cell. 

Number of chromosomes in the body cell of man, dog and 
horse are in the increasing order of 

A. man—horse—dog. 

B. dog—horse—man. 

С. man—dog—horse. 

D. horse—dog—man. 


- Which of the following illustrations involves meiotic cell 


division? 

A. Healing of wounds. 

B. Regeneration of damaged parts. 
C. Formation of tumours. 

D. Source of new genetic variations. 


20. In which of the following phases of Mitosis centromere of 


22. 


23. 


24. 


each Chromosome splits into two Chromatids? 
A. Prophase 
B. Mataphase 
C. Anaphase 
D. Telophase 


- As a result of injury internal tissues comes in direct contact 


with 

A. Columnar epithelium. 

B. Cuboidal epithelium. 

C. Squamous epithelium. 

D. All the above. 

At maturity which of the following is the living component 
of Xylem? 

A. Xylem fibres. 

B. Xylem parenchyma, 

С. Xylem tracheids. 

D. Xylem vessel. 

Which of the following is NOT a fish? 

A. Sucker fish. 

B. Flying fish. 

С. Star fish. | 

D. Dog fish. 

Which of the following characteristics can help you to 
identify a pteridophyte from bryophyte?- 


408 Handbook of Pupil Evaluation 


A. Dominant gametophyte. 
B. Sporophyte dependant on gametophyte. 
C. Gametophyte dependant on sporophyte. 
D. Dominant sporophyte. 
25. Which of the following is a modification of stem but serves 
the function of a leaf? 
A. Tendual of sweat pea. 
B. Phylloclade of opuntia. 
C. Pitcher of Nepenthes. 
D. Phyllode of Acacia. 
26. A fish when taken out of water immediately dies because 
A. gill slits close in the absence of water. 
B. gill slits open too much in air. 
C. respiration through skin does not take place. 
D. fish is a cold blooded animal. 
27. In which of the following animals blood play no' part in 
respiration? 
A. Cockroach. 
B. Earthworm. 
C. Frog. 
D. Fish. 
28. Which of the following structures} is concerned with 
asexual reproduction? 
ТА. Egg. 
B. Sperm. 
C. Bud. 
D. Gamete. . 
29. In which of the following tissues columnar expithelium is 
found? 
A. Alveoli of lung. 
B. Epidermis of skin. 
C. Germinal layer of ovary. 
D. Living of intestine. 
30. Naked seeds are found in both 
A. Pinus and cedrus. 
B. Maize and wheat. 
C. Date palm and pinus. 
D. Cedrus and bean. 


Technology of Setting Better Question Papers 409 
Section—B 
Time : 2 hrs. 


31. Give one distinguishing feature of temporary and one 
of permanent adaptation citing one example of each. 2 

32. In what way a plant cell differs or resembles ап animal 
cell with respect to organelles concerned with respiration, 
food manufacture, storage of waste products and pro- 
tective functions? 

33. Draw an internal structure of chloroplast and label 
granum, stroma, lipid droplet and starch grain. 2 

34. In the given sketch fill up haploid or diploid number 
(n/2n) of chromosomes in the brackets provided. 


organism 


Mitosis Meiosis 


N 


35. What role blood corpuseles play in formation of pus in 
a wound? : 

36. Group the following plants into two groups giving the 
basis of your classification. 

(a) Funaria (b) Pinus (c) Selaginella 
(d) Rhizopus (e) Spirogyra (f) Cedrus. 

37. Name two edible fungi and two having medicinal use. 

38. List four applications of uses of bacteria. 

39. Draw a vertical section of thallus of merchantia show- 
ing rhizoids, storage region, photo-synthetic region, 
upper and lower epidermis with one air pore. 3 

40. What would be the effect on the osmo-regulation func- 


ә ә № 


(b) Fresh water species is transferred to sea water? 2 
41. A plant has well developed root system, succulent stem 
leaves modified into spines. What could be the habitat 
. of such a plant and why? 2 
42. Skeleton of sharks which live at great depths is made 
up of cartilage only. What difference would it have 
made if they had bony skeleton? 2 
43. Develop a flow chart depicting classification of the 
phylum chordata into five main sub-phyla. List two 
unique characteristics of this phylum and compare the 
characteristics of fishes and amphibians. 5 
44. Draw an outline diagram showing the transverse sec- 
tion of a dicot root and describe the structure and Я 
function of various tissues. 5 
Key— Outline Answers and Marking Scheme 
Very Short Answer 
1. Irritability 8. Collenchyma 
2. Cristae 9. Gametophytic 
3. Ketabolism 10. Cell with higher ratio of 
4. Lysomes surface area to volume 
5. Rudolf Virchow 11. Virus 
6. Amoeba/w.Bc 12. Contractile vacuole 
7. Homologous pair 13. Alveoli 
14. Neurone/nerve cell 
Multiple Choice 
is |16 |17 | 18 | 19 20 [а eel i al РТУ ЕФ 
АЈА |с | D 
| 


B ос јл |р |с св с р B 


41 


g Handbook of Pupil Evaluation 


tion of the organism when 


(а) Marine species of amoeba is transferred to fresh 
water? 


| 


31. 


(а) In temporary adaptation individual adapts 
to some specific conditions in environment/ 
cannot be inherited. 


| Technology of Setting Betrer Question Papers 


(b) For example bending of seedling towards 
whole when kept in a dark box with a 

У whole. 

(c) In permanent adaptation species exhibit 
permanent changes over several germination! 
genetic basis. 

(d) For example fish is adapted to water and 
opuntia to desert conditions. 

(a) Presence of cell wall for protection of plant 
cell. 

(b) Presence of chlowplast for food manufacture 
in plant cell. 

(c) Presence of mitochondria in both. 

(d) Presence of vacuole for storage of waste 
product in plant cell. OR any other. 

33. Correct labelling and drawing of 
· (a) granum. 

(b) stroma. 

(c) lipid droplet. 

(d) starch grain. 
34. (a) Organism—2n 

(b) Sperm—n 


| 
д (с) Egg- n 
(d) Zygot—2n 
35. (a) W.B.C. are fighting cells of blood. 


~ 
n2 


(b) Bacteria attach the cells in the wound. 
(c) W.B.C. of the blood fight with bacterial 
cells. | - 
(d) Pus is formed by the ramnants of the killed 
bacterial cells and W.B.C. 
36. (a) Funnaria, Selaginella, rhizopus and spiro- 


gyra. 

| (b) Cedrus and pinus. 

hed (c) Basis for (a) is flowerless/seedless plants. 
| (d) Basis for (b) is flowering/seed plant. 


37. Any two like: 
(a) Edible—Morchella and Agorious. 
(b) Medicinal—Pencillium and Gibberella. 


411 


2 
2 
2 
4x4 2 
4х: 2 
4x} 2 
2x4 1 
2s d 


412 Handbook of Pupil Evaluation 
38. Any four like: 
(a) Tanning of leather. н 
(b) Fermentation of alcohol. 4 
(с) Curdling of milk. 
(d) Sewage disposal by decomposition. 
(e) Fixation of atmospheric nitrogen. 4x} | 
39. Neat and correct diagram showing correct loca- | 
tion of | 
(a) Rhizoids. | 
(b) Storage region. | 
(c) Photosynthetic region. | 
(d) Upper epidermis. | 
(e) Lower epidermis. 
(f) Ear spaces. 6x1 
40. (a) Need for elementation of excess water 
absorbed arises, 
(b) Development of Contractile vacuoles. 
(c) Need for elimination of excess water stops. 
(d) Contractile vacuoles disappear. 4x4 2 
41. (a) The plant could be a Xerophyte. 
(b) Well developed root system Shows lack of 


water available. | 1 
| 
| 


~ 


(c) Succulent stem shows the 


need for storage 
of water. 


(d) Modification of leaves into spines shows the 
need for conservation of water. 4x} 2 
42. (a) Cartilage is flexible. 
(b) Bone unlike cartilage, is not flexible. 
(c) It will not allow fish to endure pressure. 
(d) Thus it will be fatal to the fish. +. « AXE 2 
43. Flow chart like the following is expected. 


Chordates 
| | | 
j 4 


| 4 } 
у t ordatz c a stomata 
Hemichordata Urochordata Cephalochordata Agnatha Gnathosto 


8% 


(а) If all sub-phylla are correct. 
(b) Three characteristics: 3x} 14 


Technology of Setting Better Question Papers 413 


(i) Hollow dorsal tubular nerve cord. 
(ii) Noto chord. 
` (ii) Gill in the pharynx. 
(c) Fishes and amphibian: description of any 
five: 5х} 24 
(i) Both are cold blooded. 
(ii) Presence of jaw in both. 
(ii) Skin in both is slimy but devoid of 
scales in amphibians. 
(iv) Respiration through gills in fishes but 
the amphibians it is through skin, gills 
or lungs. 
(v) Heart of fishes has one auricle and 
onc ventricle but it has two auricles and 
onc ventricle in amphibians. 5 
44. (a) Outline diagram. 1 
(b) Description of structure and function of 
(i) Epidermis. 
(ii) Cortex. 
(iii) Endodermis and Pericycle, 
(iv) Vascular bundle. 4x1 5 


Handbook of Pupil Evaluation 


У "ur + OW 
a “wg # эй 
о эш ү í Эй 
о эш ү # VSA 
о эш ү ЕЯ VSA 
i ST t VSA 
a чш H VSA 
v qur + YSA. 
a qw T H YSA 
Y qu t YSA 
u qur t VSA 
H SEAT ї VSA 
ә "qur T УБА 
a эш 1 # VSA 
a qu t УБА 
a qur t VSA 
2 aur f YSA 
8 L 9 c 
ЈИ нв. =: RES 
па ропопу uonsəand 
әш I! n 
M cus pojeuinsd Sore jo wioy 


ич 

т т-моп 
Іп 
s-un 
Sun 
ziun 
Lu] 
Lauf) 
sun 
Su 
cuf) 
cau 
cau 
£u 
тиши 
E-un 
Iun 


Р 


m-qnsiiun 
3uojuo;) 


зоја ехо SIMI 
525118022 
595118002% 
SEIA 
Seod 
Soje[sue1 p, 
syo1dioju[ 
5лоји] 
золедашод 
5954 ЕШ У 
ѕәје[ѕие1/, 
s[[E29f 
5ре2ом 
SeA 
sjey 
s[[E2234 
s|[e223f 


uoneayrosds 


әлпоә(90 


Мммымммо«р«рорм 


Wd uda 


maa x 


ом 


15 


Lm ML —  ENAW C HO o. -—-—MM sso OMM әш 


SISA'IVNV 3SIA-NOILISaunO 


415 


Technology of Setting Better Question Papers 


sute[dxq n mw 


ада<оа<о<<ооооа<<а<аАоАа<аА 


уш ç 
"ш c 
"sul c 
"уш ç 
"sur ç 


"giu c 


‘syu с 


уш $ 
‘syw С 
"sur c 
‘Syu $ 
эш [ 
"ur 
"ur r 
"ur 
эш 1 
эш 1 
"aur 
jur 
эш 1 
шү 
уш 
yw T 
qu 


пе ни ча ди чи ча да да ча ла EEA сї а с Сї сч Сї С EE 


vs 
vs 
vs 
VS 
vS 
vs 
vs 
vs 
vs 
vs 
vs 
ом 
ом 
ом 
ом 
ом 
ON 
ом 
ON 
ON 
ON 
ON 
ON 
ON 


oram 
oraun 
gam 
Lun 
saun 
9101 
can 
pun. 
cann 
enun 
muun 
Lun 
saun 
rin 
orun 
g-un 
enun 
s-un 
94m 
emn 
saun 
узшп 
узшп 
узшп 


реза 
ЗАЛЕ 
зре2оМ 
so[duuexo 5310 
59155210 
səsÁjeuy 
зреоом 
ЗАО 
ѕә;е[әҸ 
soledwo) 
5ә5108029Ҹ 
525102022Ҹ 
SIEPA 
51ә1йлә1и] 
suoseoy 
$}91d19}U] 
золеашод 
золеашодђ 
595118022 1 
sjo1d1oju[ 
$ә1рашоу 
səsÁjeuy 
595118022 


‘OP 


М<рорми« 
= 


Handbook of Pupil Evaluation 


"Кезӊд=2 *o8e1oAV — 8 "IR: WIG — y :suonsonb Jo әлә Азүпәцт: 
"Aess3— d *1ossuv-11008— VS \зәл\ир-ллоц$ Азад = ҮЅЛ *»An3efqo О :suonsonb дЫ = ^e a 
"lS 5 *uonvoi[dd v = V ‘Suipuvissapun =N "овројмоим = У :золизога О 104 (È) 


*SMOTJOJ 52 suonetaa1qqe asn : ом 


a siu cz $ El cuu 
я уш SZ $ я sun ке EUG 
` 5, 1. TEA яо 
У ‘syu g с vs oram v 
8 9 5 * 
E у Е СД 


CHAPTER XV 


MODERATION OF A QUESTION PAPER 


1. Pre-examination Moderation 


The term moderation is used with different connotations. 
Moderation of questions and question papers, moderation of 
marks, moderation of results and even moderation of syllabi. 
In this chapter, however, we restrict ourselves to moderation of 
à question paper and marks. Moderation means re-examination 
of the questions, marking scheme, key, model answers (if given) 
and instructions/directions for examinees besides format and 
mechanics of paper setting. Purpose is to ensure quality, relev- 
ance, reliability and practicability of the question paper in a 
given situation, i.c. home examination or a public examination 
like the Boards' examination. This exercise is done with the 
purpose of improving the question paper in terms of require- 
ments reflected in the given design and the blue print of the 
question paper. 

After going through this chapter the reader should be able 
EU 


_(a) recognise the need for moderation of question papers and 
marks. 


(b) identify the various tasks implied in the moderation of a 
question paper. ‚ 


(с) apply various steps involved in post-examination modera- 
tion through expert judgement. 


418 Handbook of Pupil Evaluation 


(d) use statistical methods for adjustin 
different teachers or evaluators. 

(е) recognise the limitations o 
in improving the qualit 
reliability of scores. 


5 marks awarded by 


f various moderation procedures 


y of question papers, vis-a-vis 


class or the 
aminations conducted by 
different person from the 
o assumed that the moderator 
5, well conversant with the 
and is quite cognizant of the 


| ich form the basis of framing ques- 
tions. With a knowledgeable moderator quality of 


2. Tasks Implied in Moderation of a Question Paper 
Since purpose of moderation is to improve the quality 
tion paper the process of Moderation includes 
which a moderator has to undertake with 
aspects of a question paper. 


of ques- 
а number of tasks 
respect to different 


(a) Study of the 
framework, 

(b) Verification of blue print developed by the paper setter, to 
judge the inconsistency if any. with regard to the prescrib- 
ed design. . 

(c) Editing of cach question in terms of blue print, using 
known criteria of framing а question and making improve- 
ments if any, Special сате has to be taken to the use of 

directional words like q comment, justify, what do 
you know of etc. € often misused, 

(d) Checking of key, Outline answers апа marking scheme to 


Judge the correctness of model answers/key in terms of 


prescribed design for understanding the basic 


iscuss, 
ete. which аг 


$ 


Moderation of a Question Paper 419 


intended response for a question and scoring objectivity. 
(e) Judging appropriateness of organisational pattern of con- 
solidating questions in the form of sections or question 


paper as a whole, from the point of view of case of admi- 
nistration scoring and interprctation. 


(f) Review of general instructions to judge their relevance, 
adequacy of total time and section-wise allotment of time, 
(as in case of objective type questions). 

(g) Question-wise analysis of the whole question paper to see 
whether, objective tested, topic covered, form of questions 
used, marks allotted, difficulty level estimated, and estimats 
ed time for cach question tally with the blue print require 
ment vis-a-vis the design prescribed. j 

(h) Make necessary modifications in questions, instructions 


and format of the question paper in the light of observa- 
tions made in (a-h) above. 


(i) Read through the final version once again to check for 
language. spellings, serialling 


of questions and other 
aspects of mechanics of writing 


, to finalise the paper. 


3. Some Dos’ and Donts’ 

(a) A moderator has no right to modify the design prescribed 
by the examining agency. | | 

(b) А moderator need not alter the blue print unless there is 
some inconsistency in distribution of questions and marks 
relating to objectives or topics. In that case he is supposed 
to make corresponding changes in blue print or in the dis- 
tribution of questions. 

(c) A moderator has no right to change weightages given to 
various objectives, units or topics and form of questions 
reflected in the blue print which in turn conforms to the 
requirements of the design. 

(d) A moderator has no right to change the form of question 


but he is free to improve, modify, change or replace a 
question to suit the blue print. 


(e) A moderator must estimate the time requirement of each 
question as also for the whole parer to judge the adequacy 
or inadequacy of tke allotted time. ]f separate time limit js 


420 Handbook of Pupil Evaluation 


prescribed for different sections, its appropriateness in 
terms of time requirement should also be estimated and 
adjusted. 

(f) A moderator is completely free to modify the key, model 
answer or marking scheme to dove-tail it to the intended 
answer of the question. 

(g) In case of free options i.e. any 5 out of 8 questions type, 
all questions should be judged in terms of difficulty level 
and time requirement to ensure that they all are almost 
equivalent in these two aspects. 

(h) In case of internal options of Either-Or type the moderator 
should ensure that both questions or options are equivalent 
in terms of objective tested, subject area, difficulty level, 
form of question and the time requirement. 

(i) A moderator must ensure that language used in framing 
question is precise, unambiguous and comprehensible. 

(j) A moderator should see to it that format and organisational 
pattern of question paper facilitate its printing administra- 
tion and scoring of questions. р 

(К) He must review the instructions given for candidates to 
judge for their relevance, adequacy and preciseness. 


4. Post-examination Moderator 


When question Papers, marking scheme and other aspects are 
finalised and approved the examinces take the test. Their scripts 
are marked and scores are given. At this stage also moderation 
of marks is useful. This is not necessary in case of school 
examinations except in cases where a large number of students 
are there in a class consisting of many sections and individual 
teachers assess their wards independently, in annual examina- 
tion say, for promotion purpose. Of course for public examina- 
tions moderation of marks of different evaluators is essential 
to secure more reliable grading. Two approaches may be used 


for this. 
4.1. Expert Judgement 


Thi Карар ad У 
18 requires taking téVantage of experienced teachers to use 


Moderation of a Question Paper 421 


their judgement to adjust marks of different evaluators on the 
basis of ‘mental image" of the intended standard (5). This 
moderation is based on overall standard used by an individual 
teachers and NOT the actual performance of individual candi- 
dates. It involves inspection by experts of candidates’ work 
and comparing each teacher's examinees with the ‘mental 
standard’ visualised by the moderator who is considered to be 
an experienced judge with integrity. Training of moderators, 
conducting trials for ensuring agreement and arriving at con- 
sensus is the usual process followed in Mode-3 examination of 
UK (6). This is quite applicable to Indian situation with respect 
to one examining board or more than one examination boards 
within a state. 


Following steps are envisaged for а subject in this type of 
moderation: 


(a) A chief moderator/chief coordinator or paper setter himself 
is designated by the examining agency to supervise the 
whole operation. Let us call him a chief moderator. 

(b) Chief moderator may invite a representative in the subject 
from each of the area, called a head examiner/additional 
head examiner/coordinator etc. Let us call him a coordi- 
nator under whom a number of examiners/markers/ 
evaluators work. Let us call them markers of Scripts. —— 

(c) Each coordinator is given an identical set (same scripts 
xeroxed) along with marking schéme/key/outline answers. 
They are given the specific guidelines and principles under- 
lying marking or grading by the chief moderator. 

(d) Each marker grades independently the sample of scripts 
given to him. 

(e) Discussion among markers follows to arrive at consensus 

over the agreement on the marking scheme on the one hand 

and similarity of overall standards of various markers on the 
other. Consensus grade will be the average of grades given 
by different markers. 

(Е ladividual's assessments are then compared with the con- 
sensus of the group on the basis of 


(a) Overall accordance with the consensus arrived at. 


422 Handbook of Pupil Evaluation 


(b) spread of grades among the candidates assessed. 
(c) general leniency or severity observed in marking. 


4.2. Leniency versus Severity in Marking 


Just to illustrate if we take the six students i.c. in place of 10 
or 20 as suggested earlier for sample marking we may have three 
types of situations. 


Candidate Consensus d Markers’ grade ы 
gradc Магкег-1 Marker-2 Marker-3 
1. Amina 3 4 5 2 
2. Karina 1 1 2 1 
3. Marina 5 4 6 4 
4. Rabina 4 4 5 3 
5. Sabina 6 5 6 5 
6. Zarina 2 1 3 2 
Total 21 19 27 17 
Highest grade is 1. Half grade per candidate=6/2=3 
Marker-1 


Total (19) falls within +3 of the consensus total of (21) and 
can, therefore, be considered acceptable overall standard of 
this marker. | 


* Marker-2 
Total (27) does not fall within +3 of concensus grade of (21) 
and, therefore, his overall standard is not acceptable. He 
errs a lot towards severity and may be considered too 
severe. 


Marker-3 
Total of (17) again does not fall within +3 of consensus 
grade of (21). He errs slightly towards leniency. Had his 
total been (18) his standard would have come within 2-3 of 
the consensus grade (21) and become acceptable. 


Moderation of a Question Paper 423 


Further discussion of criteria and marking scheme would 
enable the chief moderator to arrive at agreement. It may need 
sometime one or two additional agreement trials to bring too 
lenient or too severe examiners/markers within an acceptable 
range of consensus grade. 


Spread of Grades of Candidates 
Let us consider the following example involving 6 candidates. 


Spread of Grades 


Candidate Consensus Grade Marker's Grade 
1. Bharat 2 1 
2. Dasrath 4 3 
3. Janak 1 4 
4. Laxman К) 4 
5. Кат 6 6 
6. Sita 5 4 
Total 21 22 


Here marker's range of grade (1-6) is the same as the соп- 
sensus grade (1-6) and, therefore, passes the 'spread of grades" 


test. He also passes the leniency/severity test as the total of (22) 
is within the accepted range of (6/2—3.00) consensus grade 
total (21) i.e. +3 of 21. Inspite of that the marker has not used 
the grades (1-6) similar to the consensus grade (1-6). For 
example 2 and 5 are not used and there is bunching of three 
candidates (4, 4, 4) into one grade. Thus range itself is not 
sufficient to indicate the acceptability of marking/grades. This 
requires another check for confirmation. 


4.3. Check for Conformity 


The main purpose of this check is to compare each marker's 
each grade with the consensus grade. Following example 
illustrates the principle underlying. 


424 Handbook of Pupil Evaluation 


Conformity Check 


Candidate Marker’s grade Consensus grade Difference bet- 
| (M) (C) ween grades 
(М-С) 
1. Аппу 5 5 0 
2. Banny 3 1 - 
3. Canny 1 1 0 
4. Danny 2 1 41 
5. Nanny 3 2 +1 
Total= ^41 


In this case difference is +1 for five candidates and it is 
normally accepted for a sample of 20 candidates. Maximum 
acceptable difference could be 12 and should not exceed that. 
Certain tables with different samples of various sizes are given 
by Yorkshire Board (1972) for the three tests mentioned above. 

On similar lines it is Possible to moderate internal assess- 
ments of different schools with the help of Inter-school mode- 
Tators. What is important is that, the tests viz. ‘leniency or 
Severity test’, ‘spread of grade test’ and the ‘conformity check’ 
if applied would 20 along way in moderating the grades to 
€nsure maximum comparability viz-a-viz reliability. Sometime 
‘balanced pairing’ is suggested by associating а lenient scorer 
with the severe scorer. The assessment of scripts by another 
marker is also done sometimes to ensure more objectivity in 
Scoring. Some boards in UK use ‘criteria of quality’ prescribed 
for grading minimum Tequirement for a grade. For example 
gtade-1 could be the evidence of reasoning and for grade=4 say, 
evidence of recall of facts and events may suffice. 

All the above mentioned measures for moderation are based 
Оп logic, opinion, experience, judgement and agreement through 
discussion among the makers, moderators and others who are 
involved in Processing. No specific statistica] measures are 
Tecommended in arriving at Consensus grade. 


4:4. Moderation by Statistical Means 


Statistical methods can also be used for adjusting of marks 


Moderation of a Question Paper 425 


awarded by different teachers or evaluators. This can be done 
by following methods. 


4.4.1. By Use of Moderating Test 

Since teacher's assessments are to be judged on the basis of 

‘conformity’ with the results on the moderating test besides the 

‘severity’ and ‘spread’ then the reference test (moderating test) 

must be the one on which candidates’ performance could be 

expected similar to the teachers assessment being moderated. 

Moderating test must be valid, reliable and discriminating so 

that candidates performance viz-a-viz ranking on both the tests 
i.c. teachers’ assessment and moderating test, should remain 
similar. Therefore, moderating test should be identical to actual 
assessment. A parallel test may be constructed by the board or 
concerned agency and it can be used as reference test for fixing 
standard of the school. Another way is to use a common core 
fest based on the prescribed instructional objectives. Perform- 
ance on this common test, of candidates from different institu- 
tions can help to judge the validity of teachers' assessments 
without moderating the marks. Of course this is based on the 


assumption that the different schools are following the same 
syllabus. 


4.5. General Ability Tests 

Another possibility is the use of content f; 
ties like reasoning, basic mathematical skills, etc. which are 
independent of the syllabi-followed. Such tests give the idea of 
general mental ability of candidates from various schools and 
this enables the moderator to interpret students? performance 
on the basis of general ability test. If there is significant differ- 
ence in the general mental ability scores of student's from 
different sehools the actual subject-wise score on teachers? 


as : ^ 
а Sessments, have to be interpreted in terms of Scores on general 
mental ability tests. 


гес tests testing abili- 


4.6. Moderation by Standardisation 


Mean and standard deviation of marks awarded by teachers 


426 Handbook of Pupil Evaluation 
and marks awarded on the moderating test are computed. Then 
marks of each student are standardised to the mean and 
Standard deviation of the school results on the moderating tesi 
using the following formula: 


Ми—Ха__ Mi—Xe 
Sin 5 
Where: Му is the mean of marks on moderating test 
(say = 50) 
M: is the mean of marks on teachers’ test 
say 60) 
S is the standard deviation of marks on teacher's 
lest i (say 5.6) 
8, is the standard deviation of marks on moderating 
test (say 7.0) 
Xi is the marks of a candidate on teachers test 
(say 40) 
X, is the moderated marks of that candidate. 

This formula is given in many texts like the one by Guilford 
(7). It is obvious that teacher was lenient (mean Score-60) com- 
pared to mean of moderating test (Mean score-50). Spread of 
Scores (5.0) is also not as large as of Moderating test (7.0). In 
the example we can find out the moderated mark as under: 

60—X, _ 50—40 10 


= =- =2 
э ашы; 5 an 
aXe _, OR 60—X,,—14 
X, 60— 14—48 


Thus 60 marks of this candidate 
become 48 moderated marks which 
score (50) of moderating test is је 


(60) of teacher’s test. This method is useful when a large 


number of students are appearing as in an examination Board 
and their marks are moderated, 


The above mentioned me 


awarded by the teacher 
are below 60 just as mean 
Ss than that of mean score 


i thods of moderation of marks are 
useful for comparison of candidates’ performance on different 
tests used by teachers or evaluators. Nevertheless, for class- 


room testing by teachers one may improve to a great extent 


= 


Moderation of a Question Paper 427 


the scoring objectivity and comparability if due precautions are 
taken at the level of designing the question paper, blue print- 
ing, construction of questions, developing marking schemes and 
during editing of questions and their model answers. Occa- 
sional use of common tests in schools can help in surveying the 
level of achievement of students. Feedback of performance on 
such tests helps the teachers to make up deficiencies through 
enrichment or remedial programme which in turn helps in 
minimising the gap in achievement scores of students from 
different institutions. Performance on this common paper, of 
candidates from different institutions can thus help to judge the 
validity of teachers’ assessments without moderating the marks. 
Moderation does improve the quality of a question paper 
and the accompanying marking scheme but only to the extent 
the quality of the question paper permits. A poorly developed 
question paper sets the limits to its improvement unless the 
whole question paper is reconstructed. Like-wise a poorly 
developed blue print sets the limits for moderation and acts as 
a. limiting factor beyond a particular stage of improvement. 
Thus process of moderation is conditioned by the limits imposed 
by the potentialities of the paper built in by the paper setter. 


CHAPTER XVI 


ANALYSIS FOR IMPROVING QUESTIONS 
` АМО QUESTION PAPERS 


1. Introduction 


It is an erroneous belief that all is going well with the paper 
setting at the public examinations. In fact the reality is that 
at least 50% of our examinations have at least 5% error of 
measurement. This is what the results of Indian examinations 
indicate. But how many of us as paper setters know this? 
Despite our best efforts to prepare designs, blue prints and 
construct-objective-based questions, a paper set by a trained 
and experienced subject-specialist may yield unintended results. 
What happens is that {һе various assumptions that are made 
during the setting cf a question paper may be quite relevant to 
certain institutions and less relevant to others. Assumption 
about students’ inputs, instructional inputs and mode of evalua- 
tion might have varied from place to place. As such, the 
assumed pre-requisite learning and skills developed during the 
course of studies may not have actually developed and the 
Tesult may not be what was expected. Therefore, to know how 
really the paper generated various types of responses from the 
Students, it is significant to judge the quality of the paper set. 
It is through proper analysis that we can find out the areas of 
adequacies and inadequacies in students' learning. 

Itis expected that after going through this chapter the 
теадег will be able to 


(а) recognise the need for analysis of tests and test results. 


Analysis for Improving Questions and Question Papers 429 


(b) differentiate between the psychometric and psychiatric 
(illuminative or descriptive) approaches. 

(c) compute reliability of tests using different methods. 

(d) calculate facility index and discrimination index of items 
using different formulae for item analysis. 

(c) interpret test results in terms of prescribed criteria objec- 
tives. 

(f) use item analysis and test results for feedback and im- 
provement of tests and students’ learning. 


2. Why to Analyse? 


It does not suffice to say that a poor ог good perfor- 
mance in the test indicates the quality of students' learning. 
This is based on the assumption that the instrument of evalua- 
tion i.e. the question paper had the desired qualities of validity 
and reliability. In case the question. paper lacks the minimum 
requirement of measurement qualities all efforts that go into 
the analysis and interpretation of test results are fruitless. 
"Therefore, the first attempt should be to find out the validity 
and reliability of the testing instrument and then if we find it 
adequate we can proceed to analyse the test results. 

At the Board level, a public examination is not only conducted 
for grading the students but also for maintaining and improving 
standards. It is, therefore, essential that the boards should 
know about the performance of students in general апа subject- 
wise in particular. Unless this is done it is not possible to 
identify the strengths and weaknesses of students which alone 
can help the boards to improve cvaluation procedures vis-a-vis 
instruction, through proper feedback of results to the institu- 
tions. In а way it helps to the board in self-evaluation, to see 
the extent of congruence or discrepancy between the intended 
and the actual outcomes. What are the instructional objectives 
(as laid down in the prescribed curriculum) which have been 
achieved and what are the ones Which need more attention at 
the hands of teachers. Therefore, data may be useful for 
affecting curricular changes vis-a-vis improvement in curriculum, 

When results are analysed in terms of subjects, it provides 
good evaluation of teachers. The poor performance of students 


430 Handbook of Pupil Evaluation 


in a particular Subject reflect the instructional inputs. Knowledge 
of subject-wise performance at the state level and institution- 
wise is necessary if one is interested in teachers’ evaluation. 
Indeed, heads of institutions do judge a teacher on the basis 
of results in their subjects. But they do not get a comparative 
view. If proper proformas are developed for examiners’ reports, 
area-wise performance within a subject is also possible and that 
„would be useful for the teachers and curriculum developers to 
redefine and improve the curriculum. Thus, to my mind four 
major purposes of analysis are: 


(a) to judge the efficiency of the test. 

(b) to judge teacher effectiveness. 

(c) to provide feedback to institutions to improve teaching 
learning strategies. 


(d) to improve the curriculum and instruments of evalua- 
tion. 


Except for the second purpose which is realisable better at 


the school level, the other three are more relevant at the Board 
level. 


3. When to Analyse? 


Analysis begins in fact at the time of designing the question 
paper. Deciding about weightages to the various instructional 
Objectives, content units and form of questions in itself involves 
identification of significance and preferred objectives, content 
units and form of questions. This is generally based on previous 
experience, results or tradition. Anyway, the blue printing of a 
Question paper further involves analysis of content on the basis 
Of which questions are distributed over the various content 
Units to test different objectives. Likewise, the question-wise 
‘analysis that is undertaken towards the end of paper setting, is 
another step towards analysis. It is at this stage that the balance 
ог imbalance of a question paper is judged and modifications if 
any, needed at this stage, are made. Ail types of analyses which 
are done before the administration of a question paper are 
based on (ће logic, subjective opinion of the setter or the 


Analysis for Improving Questions and Question Papers 431 


judgements of a panel involved at one ог the other stage. This 


sort of judgemental validity is necessary but nota sufficient 
condition for ensuring the needed validity or reliability, as it is. 
based only on experts’ opinion not on any empirical evidence, 
It is for this reason that we need to know the empirical evidences 
on the basis of which sound judgements could be formed. 

The time for analysis of test results vis-a-vis the question 
paper is after administration. of the paper. When the Scripts 
аге marked and the results are prepared, the board can under- 
take various types of analyses depending upon the Purpose and 
the use to which those results are to be put. This work can be 
facilitated by developing a structured Proforma for the markers 
(examiners) who may be asked to collate some information 
about the scripts they mark, in a manner indicated in that 
proforma. Another time for analyses is after the decl 


the results. This is normally done to study the nature of errors 
committed by the students in i 


analysis 
is also possible at a later а part of allied research 
projects in 


the s 


Stage as 
Which data from the $ 
ame board or other boards of 


for appraisal of comparative performance of students, Never- 
theless, the minimum that is ex 


is to get the 
results analysed subject-wise О see how a 
question paper behaves eceived the 
question paper. 


4. Who Can Analyse 


Depending upon the 
teachers, the 
researchers and 
reflect much 
Paper $ецег 
Teachers’ 


Purpose and timing 


of analysis the 
examiners, paper 


setters, board Officials, 
even students can undertake analysis, Students 
better on the choice of questions made by the 
area covered (content-validityy, 


5 ages | the probable reasons 
› - es giving their е e-based opinion on the 
vap, е: ~! МАР: H 

-garies Of questions alities. It is they who can 
identify the m : 


agnitude and Nature of errors committed by 


432 Handbook cf Pupil Evaluation 


students. It is they alone who can opine on the types of errors 
students make and the errors which can be traced to defective 
questions or to instructional deficiencies. Paper setters can 
self-evaluate for adequacies or inadequacies in their paper 
Setting, on the basis of analysis and reports received by the 
markers (examiners). 

Board officials can undertake analysis of results in terms of 
identification of areas of curriculum improvement as well as 
the quality of measuring instruments. Researchers can go deeper 
to analyse the causes of failures, underachievement, low perfor- 
mance and behaviour of different items or questions. They can 
even probe deep to judge the suitability of various concepts 
and their placement in the curriculum. Thus, depending upon 
the interest and administrative decisions, different types of 


Persons can be commissioned for analysis of results. 
5. Which Approach to be Used? 


There are basically two approaches which can be used. One is 
the traditional, classical psychometric approach which 
emphasises precision and prediction and is rooted in the belief 
that results can be analysed and interpreted in quantitative 
terms irrespective of the context or antecedents that might have 
influenced the students’ performance. Use of statistical techni- 
ques in processing and interpretation of the results is the usual 
mode of analysis. This is the approach which suits analysis of 
Boards results where the purpose is mainly to measure students’ 
achievement, classify and grade students rather than improve 
their achievement, Therefore, the judgements that are to be 
Made are norm-referenced where students’ performance is judged 
in terms of deviation from the norm. Students are declared as 
Pass, fail, underachievers or deviants. Judgements in terms of 
Predetermined criteria or intended outcomes in the form of 
instructional objectives is relegated to the background. А 
The other approach is the descriptive, psychiatric, narrative 
9r illuminative approach which is embedded in the social- 
anthropological paradigm and emphasises the description and 
interpretation of resuits for the purpose of improvement of 
Students" learning. The focus is more on diagnosis, narration of 


Analysis for Improving Questions and Question Papers 433 


evidences and it takes into account the antecedents, inputs and 
Processes which effect students’ learning and achievement. The 
interpretations are made more in terms of the student himself. 
and with reference to the criteria or intended learning outcomes, 
to highlight adequacies and inadequacies in learning. Therefore, 
self-referenced and criterion-referenced judgements аге 
preferred {о norm-referenced judgements as the emphasis is on 
improvement of achievement. But this approach is suitable only 
atthe school (not board) level where the teachers tnrough 
classroom testing aim at improving students’ learning. As such 
the teachers can апа should analyse classroom tests using this 
approach so that they can portray not only the students’ 
achievement in terms of factors and conditions 
achievement but also identify the gaps and in 
teaching-learning process, Which of the two a 
used to form judgements depends on the Purp 
measurement. 


affecting their 
adequacies in the 
Рр ‘oaches may be 
ose and focus of 


6. How to Analyse? 


Since the boards are cone 
it is safer to assume that the Psychometric 
easier to follow. However, the first thing is 
quality of the measuring instrument itself. If the 
does not itself reflect adequate reli 


ability, further analysis of 
data that accrue from the use of such 


a tool is of no avail. 
6.1. (a) Finding Reliability of the Test 


There are a number of methods to calculate r. 
In case of question papers having items or 
be marked on 1—0 scale as in thec 
Very short answer questions, one can 

method or the Kuder Richardson form 
which the reader can refer to any book 
For an easy-to-use formu 
appropriate and gives 2004 approxim 


434 Handbook of Pupil Evaluation 
given as under: 


(i) J.L. Saupe (2) 
. Where ry is reliability coefficient 


ОК (S.D.)? is the variance of scores 
| (S.D. К = number of items on the test 


(ii) Mc Morris (3) 
Another formula is by Mc Morris (1972) which is fairly 
accurate. 
с a Os Leu 
Index of measurement efficiency IME—1 T (SD 
Where: 
K is the number of items in the test, and 
S.D-? is the variance of Scores. 
For quick calculation of standard deviation, teachers can 
make use of the following formula by Paul Diedriech: (4) 
_Sum of + of highest score— sum of + of lowest score 
3 of the total number of students 
When the number is small use N— 1 in place of N. 

Test, retest reliability is not possible in a public examination 
as the same test cannot be given twice. In case of an essay test 
the calculation of reliability is a difficult problem. It can be 
planned through well structured questions ensuring uniformity 
of scores by means of a detailed marking scheme and instruc- 
tion to the examiners. Attempt should be made to present 
uniform tasks for framing the questions. If the questions are 
well structured and balanced in terms of objectives or the 
abilities tested, there is not much difficulty in using the split 
half method of finding out the reliability of easy p. 
optional questions, especially when a free option is NS h 
difficulty is further enhanced—to split the test into two ha ves. 
However, if all questions are compulsory or balanced internal 
options are provided say, in a Six questions essay test, we can 
obtain separate scores for each test i.e. for the first, third and 
fifth questions and also for the second, fourth and sixth. The 
correlation coefficient between the two halves can then ШЕ 
computed and followed by correlation using the Spearman s. 
Brown formula as illustrated below: 


S.D.- 


ПРЕ, ьа 2 4 


Analysis for Improving Questions ond Question Papers 435 


(iii) Spearman’s-Brown Correction Formula (5) 


"2 " Where rzz=reliability of the total test 
Yzs— 1 7,, —reliability you obtained by 
Itty, ** correlating half of the test with the 


other half (suppose 0.6) 


Example 
Suppose 1,470.60 


2(0.60) 1.20 __ 
Then using the above formula we have rzz= 140.60 ^ i.60 0.75 


(iv) Kuder-Richardson Formula-21 (6) 
Using simplified Kuder-Richardson Formula-21 we have: 


X (K—X) Where X=mean 


Tele (S.D K=Number of items in the test 
(S.D.)?= Variance (Standard 
deviation-squared) 
Example 
Suppose X—50 S.D.=6 K=60 items 
"Then: 
50 (60—50) 
Ixx— 60 (6)? 
50 x 10 500 
Peon x gg oD 


83 
Therefore, тх: = 108 ~ 0.77 


The only way to compute the reliability of a test is to give two 
tests of the same ability and correlate the two sets of scores. 
This is true of essay tests also, in which questions receive 
different number of points. K.R. Formula-20 applicable to 
objective type questions is given below: 


(v) К-К Formula-20 


( = рај 
Tax = E 1 L 1= PN J 


436 - Handbook of Pupil Evaluation 


When: 
n—number of items in the test 
P=Proportion of students who answer the items correctly 
4=Proportion of students who answer the items wrongly= 
(1—P) 
РЧ = variance of a single item scored as right/wrong. 
с2= variance _ 


(vi) Chronbach co-efficient-alpha 

Essay type exuminations—Co-efficient alpha developed by 
Chronbach (1951) is a generalisation of KR-20 formula (7) when 
items are not scored dischotomously i.e. right/wrong. For all 
compulsory questions type the following formula is usable: 


n 
У c? Individual question variance may be 
nd jmd found out and approximation S.D. cal- 
n co? culation formula can be used. 
Where is the variance of a single item. Where choice i$ 


provided Nuttal (1969) formula can be easily applied. 
k 


1 k ~ | 
EE SI 1—1 (,3 n; sj? ) | s? B s] 


Where: I=number of questions to be attempted 
К= number of questions 
nj—number of students who answered question j 
6/2 = Variance of scores on question j 
Si — Variance of total scores. 


6.2. Item Anal. lysis 


Simplified methods А ; 
The purpose of item analysis is to evaluate instructional 
effectiveness, estimate individual and group attainment through 
information about student's Performance on each item and to 
Improve test quality i.e. whether the items are doing what they 
are supposed to do. For this two important characteristics are, 
the finding out of the Facility Values (F.V.) and Discrimination 


Index (D.L) of each ite 1 i 
| Bn m or the question. The following 
Methods can be used: EA 


^ 


Analysis for Improving Questions and Question Papers 437 


(a) Facility Value (F.V.) 


A. Objective type items 
No. of students answering 
the item correctly x 100 


Total number attempting the item 


=80 (F.V.) 


(i) F.V. of an item 


40 x 100 
е8, 50 


(а) If number of students is less than 40 take the whole 
population. 

(b) If number of students is between 40 and 100 top 27% 
and bottom 27% may be considered. 

(c) If number of students is beyond 100 then top—bottom 
10% will be sufficient. 


(ii) Facility index can also be calculated by: counting the 


number of students who attempt the item correctly from both, 

the upper and the lower groups i.e. 27% from each by using 

the following formula: 

Number of correct answers in both the groups 
Total number of papers in both groups 


Facility index— 


High Scoring (H) Low Scoring (L) Total 
Correct 8 8 11 
Incorrect 2 7 9 
Total 10 10 20 


ss EO 
Facility index—-55--0.55 


B. Essay-type (Multiple-point items) 
(i) D.R. Whitney and D.L. Sabers. We can calculate as under: 
(а) Arrange papers from highest to lowest scoring students 
(Total Score). 
(b) Highest 27% (or 25%) and lowest 27% (or 25%) are 
taken out for item analysis. 
(c) Calculate sum of points made on each item for each of 
the two groups. 
(d) Use the formula given below: (8) 


438 Handbook of Pupil Evaluation 


. ESH+ ESL- (пт) (Хи) 
Index of difficulty= "TU, GERE. NU 
Where: 
ZSH=Sum of the F X Column for high scoring group 
=SL=Sum of the F X Column for low scoring group 
Xmax = Maximum possible score on the item 
Xmin =Minimum possible score on the item 


ny =Total number of papers in the combined high and 


low groups. 
Example 
TABLE—1 
High Scoring group (H) Low Scoring group (L) 
Item score F FX Item score F FX 
oco HI REPRE NE 
5 2 10 5 0 0 
4 3 12 4 1 4 
3 4 12 3 4 12 
2 1 02 2 2 04 
1 0 00 1 2 02 
0 0 00 0 1 00 
` Total: 10 36 Total: 10 22 


Using the formula we have : 

36+22—(20) (0) _ 58 _ 52 
20 (5—0) 100 
The facility index is equal to 0.58 which means that the 

class average is 58% of the difference between the lowest an 

the highest possible class average. ( 


Facility Index— 


(ii) Morrison index 
F.V. of a question=( 50 4- Mo —Мт) 
Where: Mg =Mean percentages mark on that question for all 
candidates 
Мт —Mean ability in percentage total mark of those 
who attempted the question. 


Analysis for Improving Questions and Question Papers 439 


(a) use entire population if up to 40. 
(b) use top-bottom 27% if 40 to 100. 
(c) use top bottom 10% if population is > 100. 


Gii) Willmott Nuttal 
F.V.=M+MQ-—MT 
Where: 
MQ=Same as above 
MT=Same as above 
M=Mean % marks. 


(iv) Edwin Harper 

х MU--ML 
Б. 2 х Мах. marks for the question 2100 
Where: 

МО = Mean mark of top 1/Sth of students on the question 


ML- Mean mark of bottom 1/5th of students on the question 


(b) Discrimination index 

Index of discrimination is an important indicator of item effec- 
tiveness for any discriminating test. An item which is attempted 
correctly by students who have good command of the objectives 
measured by the total test and which are failed by students who, 
do poorly on the total test, is considered to contribute in a 
positive way to the ability of the test to discriminate accurately 
among different levels of students performance, vis-a-vis the 
relationship of the test scores. The index of discrimination can 
be calculated by dividing test papers into high scoring and low 
scoring groups. 

Following are some of the formulae in use: 


CAR. RH—RL 
(i) Index of discrimination= NH 
Where: RH=number of right answers in high scoring group. 
RL=number of right answers in low scoring group. 


NH=number of papers in either high (H) or low (L) 
Scoring group. i 


Gi) Johnson upper-lower index 
D.L=(F.V.) Higher ability 27% (F.V.) lower ability 27 %. This 


440 Handbook of Pupil Evaluation 


is quite useful for classroom tests. 

(a) if strength is < 40 grouping of 50%—50%. 

(b) if strength is between 40-100 grouping of 27—27%- 

(c if strength is > 100 grouping of 10%—10% can be used. 


(iii) For choice type (Essay) 
Harper’s Formula 
DI- 1.8 (Му — Мр) _ 
77 Мах. marks for the question 
Where: MU = mean mark of upper 1/5th scores 
ML = mean mark of lower 1/5th scores 


(iv) Р.1. is the product moment correlation between marks on 
the question and the total marks. 
__2(х—х)(у—у) 
D.I.= NS. S, 
Where: х= mark on question (whose D.I. is to be found) of a 
given student 


y=total marks of the same student 
М = number of students 

Sx=sd. of x 

Sy=sd. of y 


(v) Whitney and Sabers formula 
—  — Su—SL 
Index of discrimination = CH Qe Кир 


Where: | 
Sy —sum of the FX column for the high scoring group 


S; =sum of the FX column for the low scoring group 
ng-number of papers in the high scoring grouP (or low 
Scoring group) 


Xmax— maximum possible score on i 
Хит = minimum possible score on the item 


In the previous example (Table-I) Sy —36, 8, 22, Xmin=0 


the item 


ng —10 
36-22 14 058 
Therefore, D..—15 ($—0)- 50 ^" 


Depending upon the purpose and natur 
from a testing situation one may use th 


e of data that accrue 
e relevant formula to 


d 


+ 


Analysis for Improving Questions and Question Papers 441 


compute facility indices and discrimination indices. Individual 
items may be very good when framed but when grouped toge- 
ther in the form of a test they may behave differently. An item 
having very good (50%) facility index may come out as poor dis- 
criminator with a low discrimination index. If a board undertakes 
this sort of analysis to find out the quality and behaviour of 
each question then it is ina better position to take necessary 


'steps for the next year to: 


(a) improve the quality of questions. 

(b) improve the validity and reliability of the testing 
instrument. 

(c) feedback to the item writer/paper setters to improve 
their skills. 

(4) feedback to the schools to improve their classroom tests. 


Besides the question and the question paper, equally 
important is the need to analyse responses of students. What 
are the types of mistakes students make? What are their 
strengths? What type questions do they attempt morc satisfac- 


“torily? What are the instructional objectives which demand 


more appropriate answers? What are the content areas in which 


‘students performance is not satisfactory? On which form of 
questions is students’ score more compared to other forms? 


What are the questions which created trouble for examinees to 
know the nature and scope of expected responses? These and 


‘тапу more questions can be identified, the answers to which 


can be found out by analysing students' responses. Besides 
analysis of students’ responses, the test results should also be 
analysed. Frequency distribution of scores, mean, median and 
mode, dispersion etc. are the basic requirements which can be 
easily applied to all variety of tests, theory and practical, so 
thatthe scores could be transformed and made comparable. 
How rigorous the techniques they are able to use is upto the 
Boards. At least simple transformation of raw scores into Z- 
scores is quite easy to apply and on the basis of which we can 


‘convert marks into T-score also. What is the underlying 


emphasis to ensure comparability of marks in theory vs 


;practical, external vs internal, written vs oral, History vs 


442 Handbook of Pupil Evaluation: 


matbematics etc. Involves only an administrative decision and 
a statistical treatment of data. 


7. Using Results of Analysis 


The simple analysis of student's performance which can be 
utilised with profit pertains to the analysis of test results in 
terms of: 


(a) instructional objectives. 

(b) content areas. 

(c) students’ errors. 

(d) distractors’ plausible (multiple choice items) 


Once analysis is done carefully the data can be used for 
following purposes: 


1. To judge the efficacy of the various instructional objectives. 
2. To discover the abilities (intended learning outcomes) 


which are not properly developed. 
. To identify the content areas which may be redefined or 


omitted from the curriculum. 
4. To take decisions about the grade placement of certain 
concepts. 
5. To suggest possible teaching-learning strategi 


ing instructional practices. 
6. To provide guidelines to the institutions for 


evaluation practices. 
To undertake further diagnostic 
analysis of errors. 


Uu 


es for improv- 
improving 


4 studies for detailed 


9. Use of Item Analysis 


s defective and ineffective: 
the item writer to 
an develop а pool 
m x 15cm cards: 
ng good items. 
f the teacher 


The discrimination index indicate 
items, thereby providing useful feedback to 
improve his items. Over the years a teacher c 
of individual items which can be typed on 10cm 
along with the item indices. The labour of prepari 

decreases after a few term tests as the main job © 


Analysis for Improving Questions and Question Papers 443 


would then be of selecting, revising and assembling of items 
rather than preparing new items for the test. The item selector 
is often tempted to discard defective items, although improve- 
ment is possible in most of these items if care is taken. A 
study by Lehmann and Mehrens indicated that revising of a 
item takes only 1/5th of the time needed to write а new item. 
It is therefore, necessary that more effort be made to improve 

the existing items and when replacement is to be made, the 

concept and objective tested by the item urgently should remain 

the same while making the replacement of items. 

The characteristics of the items change drastically after 
the curriculum changes. The facility indices and discrimination 
indices change from class to class. Therefore, a teacher should 
be cautious in interpreting the data from a single item as 
indicator of an individual student’s achievement. 

Paper setting is only meant for collecting information about 
students’ learning. Improving the validity and reliability of the 
evaluation instrument is only an attempt at getting relevant, 
adequate and dependable evidence. How best this evidence can 
be used to improve students’ learning, instructional strategies, 
measuring instruments and management of examinations is 
equally important, for this concept of evaluation as feedbaek 
must be appreciated. Once the ecology of testing is realised by 
the boards in particular and teachers in general, more and more 
emphasis on the feedback aspect of evaluation would be laid to 
bring about improvement in instructional objectives content or 
education, methodology of teaching, technique of testing and 
even management of examination. This would, however, not be 
possible unless the teachers in general and managers of exami- 
nations in particular take cognizance of the role of analysis 
and feedback in the total evaluation system. 


CHAPTER XVII 


EVALUATION TRENDS IN 1042 PATTERN 
OF SCHOOLING ' 


1. Introduction 


The curriculum for the ten-year school prepared by the 
N.C.ER.T. (1975) has provided the basic framework for 
improving the quality of education at different stages of school 
education. It is for the first time that guidelines for a broad 
based curriculum have been suggested keeping in view the 
societal needs for general education. This framework, popularly 
known as the N.C.E.R.T. approach-paper on curriculum, (1) has 
been widely discussed at the national and state levels. By and 
large educationists, curriculum specialists and teachers have 
accepted this pattern of schooling as the most relevant frame- 
work for educational reconstruction. 
In this chapter attempt is made to highlight trends of 
evaluation in the context of 104-2 pattern of schooling. The 
reader after going through this chapter should be able to 


я Я 1 
(а) identify the assumptions underlying the ten-year — 


curriculum. . 
(b) formulate key concepts forming the framework of the new 

curriculum. | | à 
(c) trace various implications of evaluation in the context O 

curriculum intentions. z 

isati n the 

(4) make generalisations about the changed empii м з 

evaluation process and the system as envisage 

curriculum. 


Evaluation Trends in 104-2 Pattern of Schooling 445 


(e) analyse the philosophical, social, psychological, scientific 
and economic bases of trends in educational evaluation. 

(f) analyse the conditions and developments in the field of 
education that have bearing on trends in evaluation. 

(g) infer various trends relating to different aspects of 
evaluation. 


Since this section is aimed at developing a framework 
for evaluation it is not necessary to discuss the highlights of the 
new curriculum. However, the framework of the curriculum 
includes evaluation as its integral part. As such, the evaluation 
System is embedded in the curriculum roots of the new pattern 
of schooling. Assumptions underlying the new curriculum 
provide the basis for identifying the assumptions of the 
framework of evaluation. It 15, therefore, necessary to state 
these assumptions in clear-cut terms. 


2. Assumptions 


Though the approach-paper does not list as such the under- 
lying assumptions, it is obvious from the content of the docu- 
ment that the disciplines of the curriculum, the objectives, the 
methodology of teaching and the mode of evaluation as reflected 
in this document, appear to be based on the following 
assumptions: (2) 


(a) A good curriculum is one which fosters a national identity 
and reflects national development. 

(b) Uniformity of the curriculum would lead to reduce the 
gap between standards of education among the different 
states. 

(c) Curriculum renewal at regular intervals is essential to meet 
the demands of new knowledge and the needs of the 
Society. 

(d) Teaching of mathematics and science is essential for every 
citizen to develop a rational outlook. 

(e) Teaching of work-experience is essential to reduce the gap 
between the elites and the masses by relating education to 
the world of work. 


446 Handbook of Pupil Evaluation 


(f) Language is a powerful tool for developing a catholicity 
of outlook and a means to cultivate the basic value of 
culture. 

(g) Artistic experience, health and physical education and 
character-building are essential components for self-actua- 
lization of an individual. 

(h) Flexibility in curriculum provides for better individual 
development and motivation. 


3. The Key Concepts 


It is for the first time that Science, Mathematics, Work-Experi- 
ence, Health and Physical Education have been introduced as 
compulsory subjects as a part of general education for the ten- 
year schooling. Multiple-entry, the semester system, the unit 
approach, advanced-level units, evaluation and feedback etc., 
are some of the new concepts which are stressed in the new 
curriculum. It will not be out of place to suggest here the four 
key-concepts which form the warp and woof of the new 
curriculum. These may be termed as compositeness, integration, 
learnability and feedback which are further explained below. 


3.1. Compositeness 


It refers to the total picture of the disciplines of the curriculum 
at various stages of education and the different components of 
curriculum. Emphasis in the new curriculum would, therefore, 
not be limited to the content of the curriculum but also of all 
the four major curriculum components, namely; objectives of 
the curriculum, content of the curriculum, the methodology of 
the curriculum and evaluation of the curriculum. Development 
of the curriculum of a particular discipline 15, therefore, 
envisaged in accordance with the total curriculum plan. 


3.2. Adaptability 


This term refers to the flexibility in curriculum oe 
development, implementation, and evaluation of papi 
is visualised that each state will have the freedom to formu 


Evaluation Trends in 104-2 Pattern of Schooling 447 


the curriculum objectives, curriculum content, curriculum 
strategy and curriculum evaluation in accordance with their 
educational antecedents, expected inputs and the needs of the 
state. It calls for emphasis on the individual, learner-based 
methodology of teaching and mode of assessment. It highlights 
the need for individualised instruction and individual-based 
evaluation. The unit-teaching approach would be the basis for 
instructional inputs which would emphasise self-based learning 
and self-referenced evaluation. 


3.3. Integration 


Integration is viewed in terms of articulating the curricula of 
different stages on school education as well as the various 
disciplines of the curriculum. It also refers to integration of 
teaching and testing for adopting the unit approach of teaching 
and testing. Likewise, organisation of learning experiences and 
performance measures would be objective-based. The emphasis 
would be on the need for appreciating the dynamic interaction 
between objectives, content, teaching-learning strategies and 
evaluation. 


3.4. Learnability 


This aspect emphasises the need for cultivating the habit of self- 
learning and self-evaluation, development of intellectual skills, 
cultivation of a rational outlook. Emphasise on the concept- 
attainment principles and generalization would find a more 
prominent place in the educational process. Factual information 
and knowledge of specifics would no longer be emphasised. 


3.5. Feedback 


This aspect refers to providing pupils the information about 
adequacies or inadequacies in their learning. Diagnosis of 
pupils’ difficulties, application of correctives, identification of 
instructional deficiencies and efficacy of concepts and objectives 
would attract more attention on the part of teachers апа 
evaluators. 


448 Handbook of Pupil Evaluation 


From the above-mentioned key concepts, one can easily infer 
that the emphasis in the new curriculum is on understanding the 
closely inter-related curriculum components, thereby emphasis- 
ing the need for making evaluation integral to the total process 
of curriculum development. 


4. Implications for Evaluation 


With reference to the first concept, it can be assumed that when 
evaluation is considered an integral part of the teaching-learning. 
process it brings quality-control into the whole process of edu- 
cation. Therefore, in the new curriculum, the emphasis would 
shift from pupil's evaluation to curriculum evaluation. Like- 
wise evaluation would be extended from the cognitive to all 
other non-scholastic areas. 

When evaluation is done in terms of learners, rather than 
the class, it provides more valid measures of students' learning 
and a more reliable basis for adapting instruction to the needs 
of the learner. Accordingly, the emphasis would shift from 
achievement test to diagnostic tests, from mixed-ability teaching 
to individualized instruction and from norm-reference testing to 
criterion-referenced testing. 

When evaluation is based on the objectives of the instruc- 
tion and the learning experiences provided in a unit, it provides 
evidence about pupil's learning that is likely to be more 
reliable. This is based on the assumption that the teacher who 
teaches is the best judge of his students. This would lead to the 
use of a wider variety of tools and techniques, cooperative 
evaluation, objective-based and integrative teaching and testing. 

In terms of feedback, it is assumed that regular feedback 
about pupil's achievement would improve their achievement. 
Accordingly, frequent evaluation at regular intervals, unitwisc 
teaching and testing, diagnosis of pupils’ weaknesscs, remedial 
instruction, proper interpretation and reporting of results imme- 
diately after evaluation will be the more sought-for activities of 
the teachers. 

_ When evaluation is considered an integral part of the teach- 
Ing-learning process, evaluation must be relevant to the objec- 
tives of teaching as well as to the learning experience provided to 


Evaluation Trends in 104-2 Pattern of Schooling 449 


achieve these objectives. Evaluation therefore, is based on these 


лмо aspects of the educational process and at the same time 
provides evidence about the efficacy of the two aspects. Intend- 


ed learning outcomes, collection of evidence, reporting of 
evidence and the use of evidence can be visualized as the five 
major aspects of evaluation. 

When we think of the intended learning outcomes of a parti- 
cular unit of teaching, we have to specify the objectives of that 
unit and even decide about the level of learning in that parti- 
cular objective which the student is expected to achieve after 
instruction. Besides, how many students do we expect to attain 
that level? Apart from this, the time-limit for obtaining the inten- 
ded level of achievement can also vary from learner to learner. 
Therefore, in the new pattern, there is flexibility in the expected 
level of learning in terms of the time required to achieve that 
level. Students can take their own time to reach the expected 
level of achievement. 

As for collection cf evidence about pupils’ growth, the 
significance of the data-gathering devices cannot be over- 
emphasised. Similarly, the reliability of the data-gathering 
devices, the usability of the evidence and its systematic record- 
ing are the main features to be reckoned with.. It may be point- 
ed out here that the relevance of the evidence, rather than the 
reliability of the evidence, is more important in teacher-made 
tests. " 

For analysis and interpretation of evidence, evaluation 
should be more concerned with the objective-wise performance. 
Norm-referenced measures cannot, therefore, be considered as 
important as criterion-based measures in the examinations of 
tomorrow. 

For reporting of evidence, what is intended is to indicate the 
level of performance rather than a judgement on the perform- 
ance. Performance cannot be taken as an absolute measure but 
only as a relative performance which is, in fact, only ап estima- 
tion and not as a perfect measure. Thus the focus of evaluation 
would be on diagnostic appraisal rather than on labelling of 
a student; on improvement of his achievement rather than on 
measurement of it. Judgement is revisionary in nature rather 
than irreversible, as we find in today's public examinations. 


452 Handbook of Pupil Evaluation 


pupils’ growth in different aspects of their development is. 
Teported to students, teachers, parents, employers and 
institutes of higher learning, the more appropriate and 
Teliable would be the decision taken for classification, 

- certification and selection of students for different purposes. 
Therefore, regular recording of pupils’ performance in 
various areas of development is a pre-requisite for 
every evaluation programme. 


6. The Bases of Trends 


This section deals with the trends in evaluation. That 
change in emphasis on evaluation is conditioned by the 
shift in educational theory, is not difficult to conceive. More- 
over, it is also not difficult to establish a relationship between 
a trend and the social need. Like a fashion which is not only 
the outcome of a social need but also a craze for deviation 


_ from the normal, a trend does not necessarily reflect the fulfil- 
ment of social needs. It may receive social support or engender 
hostility and criticism. Nevertheless, a trend does depend on 


the cumulative pressure built against the unwanted tradition 


Which is alleged to have stopped meeting the educational need 
of the time. The same is applicable to evaluation as well. As 
long as evaluation is considered as the intelligence service of the 
teaching-learning process, its emphasis would continue to vary 
with the educational process but the longevity of the trend 
would depend on the utility and diffusibility of the change 
which indeed stems from various predisposing factors. 


6.1. Philosophical Basis 


A system of evaluation, vis-a-vis the educational process is 
a reflection of the philosophical, societal, psychological 
scientific and ecological bases of the educational system. 
Therefore to trace the trends in evaluation we have to 
identify those factors which necessitate the change under 
each of the above mentioned bases of education. The 
democratisation of the process of education has led {о the 
recognition of the worth of an individual and his role P 


: 
) 
| 


| 


ad 


Evaluation Trends in 10 +2 Pattern of Schooling 453 


the decision-making process. His ability to participate in the 
= learning process with his assets and liabilities cannot now be 
questioned. The purpose of evaluation has to be thought of 
in terms of the values of the system of which it is an integral 
part. Evaluation as a service component rather than as а 
judgemental process has, therefore, to be appreciated. Thus 
evaluation must take into account such skills and abilities 
which are conductive to the development of the individual. 


6.2. Social Basis 


From the societal point of view, -of late there has been. 
unscathing criticism of the utilisation of public funds on 
educational investment. To what extent expenditure on big 
curriculum projects, like B.S.C.S. has been commensurate 
with the dividends that accrued from it, was а question raised 
by the public in U.S.A. Is the content of education, vis-a-vis 
inclusion of various subjects, in accordance with the social 
needs? Can we really pass irreversible judgements without 
having a negative impact on examinee's personality? Are 
examinations really economical? АП these апа such other 
questions have created among administrators the need for 
accountability of al new educational endeavours. So the 
evaluation machinery cannot remain oblivious of the need for 
cost-benefit analysis in any programme of educational evalua- 
tion. 


6.3. Psychological Basis 


From psychological angle, there is increasing realisation. of the 
role of objectiveness in teaching and its impact on learning. 
More and more students’ involvement, individualised, self- 
paced learning, diagnostic teaching and  learnability ате, 
among others, the preferred modes of instruction and 
effective learning. An evaluator of today and tomorrow 
would, therefore, be compelled to gear his evaluation to the 
appraisal of such abilities using appropriate techniques and 
relevant situations. 1 


454 . у Handbook of Pupil Evaluation 
6.4. Political Basis 


Politically, education in developing countries is no longer 
the monopoly of the elites but a birth right of each citizen 
irrespective of cast, creed and class. The needs of weaker 
Section of society are considered on Priority basis. Differentia- 
ted curricula, graded textbooks апа diagnostic evaluation 
are receiving more and Most attention in democratic countries, 


uc r : 5 аге facing problems 
of de-emphasising these examinations and delinking them 


with employment, Biving way to teacher-based continuous 
assessments. 


6:5. Scientific Basis 


Scientific basis of measurement. are now questioned more for 
their reliability and relevance to the goals of education. Grading 
rather than numerical Scores is gaining ground. The need for 
well designed question papers, scoring objectivity, compara- 
bility of results and sound judgements are the demands raised 


as a sequel to the onslaughts of misplacement and misclassi- 
fication of students resulting from the present tools and 
techniques of evaluation. 


6.6. Ecological Basis 


Lastly, the impact of examination on the various components 
of the teaching-learning process especially the pathology of 
examinations is now attracting more attention of the evaluators 
and educationists than before. The use of evidence resulting 
from evaluation procedures, for adapting ends and means to 
each other, is now being considered more or less obligatory, 
-to improve students? achievement and instructional practices. 
To ward off the Strangulating influences of examinations on the 
teaching-learning Process there is need'for understanding the 
interaction of ,tValuation with other components of the 
teaching-learning process, Thus the present day evaluators are 


becoming more and more Cognizant of the ecology of exami- 
Nations. М 


$- 
) 
| 


C 


Evaluation Trends in 10+-2 Pattern of Schooling 455 


7. 'The Underlying Principles 


From the above-mentioned bases viz. the philosophical, the 
societal, the psychological, the political, the scientific and the 
ecological, one may be tempted to identify the underlying 
principles which would form the basis for tracing the 
various implications and trends in evaluation. Such trends 
which emerge from these principles may relate to various 
phrases of evaluation such as collection of data, analysis, inter- 
pretation, judging and decision making besides the very concept 
of ‘evaluation’ itself. The following principles may be 
delineated: (3) 


(a) A knowledge of inadequacies in learning motivates the 
learner to feel the need for improvement in learning. 

(b) A knowledge of expected learning outcomes provides 
direction for teaching and learning. 

(c) A continual diagnosis of students’ gaps in learning accele- 
rates students’ learning if proper remedial measures are 
taken regularly. 

(d) The more comprehensive the evidence about pupils’ 


growth the more sound and the judgements made about 
learners and introduction. 


(с) If the learner is judged in terms of his own assets and 
liabilities and his adequacies in learning are continuously 
reinforced, he develops a positive self-concept which 
improve his learning. 

(f) Regular feedback of test-results generates a dynamic 
interaction among the different components of the educa- 
tional process which leads to homeostasis in the learning 
milieu. 

(в) Good evaluation practices promote better learning and 
improve teaching. 


The above mentioned assumptions pave the wa 

ing the trends in educational evaluation 

| d : А 
tangible while others are intangible. 


y for identify- 
some of which are 


456 Handbook of Pupil Evaluation 


8. The Emerging Emphases 


(a) The Broadened Scope 


Recent literature in the field reveals that evaluation is no longer 
limited to collection of data. Its Scope is being extended to 
description of measurement for forming judgements on the 
basis of which decisions are made about the learner and the 
learning process. Measurement is a pre-requisite to апу evalua- 
tion but not synonymous with it. The focus of evaluation is on 
improvement of students’ achievement through timely provision 
of data about their inadequacies in learning. 
Therefore: 


The restrictive concept of measurement is broadened to that 
of evaluation thereby focussing more attention on the 
decision-making role of evaluation (Pedagogical) than 
merely on its measurement role. (4) 


(b) Objective Centredness 


Since Bloom's taxonomy (1956) followed by Krathwhol's (1964) 
and Harrow's (1972) emphasis on objective based teaching and 
testing is on the increase. Despite some difficulties, the objec- 
tive-based approach to evaluation has come to stay. Specifica- 
tion of objectives in terms of behaviour is now considered useful 
in providing direction to teachers for both teaching and testing. 
Both norm-referenced and criterion-referenced models of 
teaching emphasise the need for unit objectives against which 
teaching and testing are validated. The content 15 not ЈЕ 
objective of testing but a vehicle or medium of testing var 
skills and abilities. 
Therefore: 


Emphasis on testing of content is shifted to testing of objec- 
tives of discipline thereby highlighting the need for appraisal 
of intended learning outcomes of a learning unit through 
the medium of prescribed content. 


Evaluation Trends in 104-2 Pattern of Schooling 457 


(c) Formative Function 


Evaluation is not a one shot affair at the end of a course, a 
term or a semester. It is considered now a service component 
of the teaching-learning process involving a continuous appraisal 
of pupils’ growth with a view to adapt instruction to improve 
students’ learning. 

Therefore: 


Emphasis from summative evaluation has shifted to the 
formative and diagnostic evaluation in order to make 
evaluation a quality control device and а service 
component 


(d) Total Development Directed 


Recognising the integrative development of the child in the 
area of cognitive, affective ‘and psychomotor domains the 
present evaluation procedures limited to appraising of cognitive 
development cannot be considered adequate to appraise the 
total child. Since affective entry behaviour enhances learning 
and psychomotor skills act as a limiting factor besides provid- 
ing motivation and interest, their role in the process of learning 
cannot be overemphasised. 
Therefore: 


Emphasis in evaluation has extended from mere appraisal 
of cognitive development of the child to the whole child 
to encompass the total growth of pupils, in both scholastic 
and non-scholastic areas. 


(e) Illuminative or Descriptive Evaluation 


Because of the comprehensive scope of evaluation to appraise 
the total child, it becomes imperative to evaluate pupils in 
all such situations which are relevant to the gathering of eviden- 
ces relevant to the area of scholastic and non-scholastic traits. 
Thus the learners’ peers, parents, community, teachers and the 
learner himself are the relevant sources of evidences which have 
to be tapped in the context of the social milieu which 
conditions learning. The mere psychometric data collected from 


458 Handbook of Pupil Evaluation 


the experimental approach will, therefore, not do. Some sort 
of a social-anthropological approach is called for which could 
illuminate the total array of agents, relations and contexts so as 
to reflect the impact of the total learning milieu on students’ 
learning. 

Therefore: 


The traditional Agricultural-Botany paradigm involving 
experimental design of evaluation is giving way more and 
more to the descriptive social anthropological model that 


takes into account the context, the social set up and parti- 
cipants’ observations which have a direct bearing on illu- 
minating the learners’ achievement and growth. (5) 


(f) Diagnostic Focus 


The shift from prediction to diagnosis is cvident now, especially 
in the context of the recognition of weaker section of students. 
Research-based encouraging results through remedial program- 
mes undertaken as а sequel to diagnostic testing have resulted 


in giving more emphasis on diagnosing weaknesses than on 
measuring achievement for grading. Diagnosis, remediation and 
improvement of achievement is the focus. 

Therefore: 


The stress of todays’ measurement is reflected more and 
more on identification of students’ difficulties and errors 
‘rather than on measurement of achievement іп a given 
content. The focus of evaluation is now on improvement 
of students’ achievement rather than on measurement of 


achievement. 


(g) Relevance of Evidences 


Once we accept the role of illuminative evaluation in the edu- 
cational scene We are bound to gather evidence about various 
aspects of students’ development through different sources 
and personnel who watch the learner grow through the corridors 
of time. This would require not only the use of a wide variety 


Evaluation Trends in 104-2 Pattern of Schooling 459 


of tools and techniques which could portray the evidence of 
each type but also the opinion of all those who are concerned 
with the growth and development of the pupils. Techniques 
like those of observation, rating scale, checklist, sociometric 
devices, performance tests etc. would now find a more promi- 
nent place in educational evaluation. This would also generate 
evidences that are more relevant to different aspects. Process 
of performance will in no way be less significant than the end 
products of learning. : 
Therefore: 


The relevance of measurement is being considered more 
important than the reliability of measurement as far as the class 
tests are concerned. More and more emphasis is given to the | 
judgement of persons rather than tools and techniques they use. 


(h) Processing for Meaningful Scores 


The methodology of processing is at present limited to arbitrary 
cut scores and divisions into which students are classified on the 
basis of raw scores. This has caused difficulty їп comparing 


students! performance from subject to subject and from 
examiner to examiner. The need for scientific interpretation in 


combining the scores is increasingly felt now in order to pass sound 

judgements. Likewise, the need for use of grades in place of marks is 

considered more reliable measure for minimising misclassification 

of students when viewed in the light of low reliability of existin, 

instruments of measurement. 2 
Therefore: 


The methodology of processing the evidence rests more and 
more on subject-wise analysis of results which are more 
diagnostic than global scores. Transformation of raw scores into 
standard scores for combining marks and their scaling is 
preferred to arithmetic totalling for enhancing the 
meaningfulness of scores. Alternatively, use of grades in place of 
marks are recommended more strongly both in internal and 
external examinations. 


460 Handbook of Pupil Evaluation 


(i) Criterion-referenced Judgements 


At present group-based (class) or norm-referenced judgements 
are common. At public-examination level, the irreversibility of 
judgements as pass or fail or third division, have an adverse 
effect on students? learning and development. Declaring students 
as deviants, failures and under-achievers with reference to a 
norm group has a corroding effect on the personality of the 
‘students. Every student benefits in accordance with his apti- 


tude, time allowed to learn, perserverance, quality of instruction 
and his ability to benefit’ from instruction. As such evaluation 
must look for reference to the child himself as far as progress 


is Concerned. Likewise the achievement on the predetermined 
criteria of success on a topic or a skill is more important than 
knowledge of his deviation from the norm group. (5) 

- Therefore: 


Self-referenced and criterion-referenced measurement rather 
than norm-referenced measurements; mastery judgement rather 
than deviation judgements; reversibility rather than 
irreversibility of judgements, are the significant departures 
sought for by the modern evaluators to provide sound basis for 
academic and administrative decisions. 


(j) Feedback for Equilibrium in Teaching-Learning Process 


Feedback and correctives being essential features of any evalua- 
tion programme, the examining agencies and class teachers are 
now more conscious of the need for feedback of results to the 
institutions and students respectively. In fact very little work is 
available on the ecology of examinations. But there is a E 
ing consciousness among the teacher-evaluators and as is 
administrators to use the cibernetic approach of system к 

whereb y the impact of one component of evaluation on н. А 
components іп a given system ог educational process cou e 
known by understanding the interaction among each other. 
How can we use evidence for generating a dynamic interaction 
to energise the teaching-learning system by using evaluation as 
feedback, is the emerging interest of evaluation experts of today. 


Evaluation Trends in 10-- 2 Pattern of Schooling 461. 
Therefore: 


The pathology of evaluation practices spread through the 
use of evaluation as a judgemental device that leads to un- 
desirable influences on teaching-learning process is being 
sacrificed now at the altar of the ecology of evaluation. 
which aims at dynamic equilibrium by adjusting the teach- 
ing-learning strategies to improve students’ learning (7). 


In the second part of this chapter attempt has been made to high- 
light the significance of the philosophical, psychological, societal,. 
political, scientific and ecological basis of educational evalu- 
ation. Each of these bases pinpoints some nodal issues which be- 
come the basis of change in the emphasis on evaluation practices. 
However, all such issues or foci are based on certain assump- 
tions relating to the learner and the learning process. Though 
" some people may question these assumptions, yet these do reflect 


some consensus. From these assumptions, follow the various 


E or emphases which we find today Vect the we, 
values, prefe 05 { 'omi T 

= eee and promises. To what extent these trends 

8 able, relevant and practicable де 
ә pends on tl ai 

that our clientele develops, the availability of resources Ls 
potential energy of the innovation. For a trend to b серле, 
stabilised and reflected de c 


in the system we have to do vali 
hope confidently and wait patiently, vals 


2 the results е 
ours, like a good researcher. of our endeav- 


Pt o 


1 10. Need of the Day 


Once the role of evaluation in teachin 
ciated by the teachers, students 


5 and learning is appre- 
importance of external examination 


and administrators, the 
would be relegated а. 


Secondar iti DNE 
\ the oc e Ml Such examinations can be used to compare 
. . 5 а 
| adinissions aed i agn institutions, to select Students for 
5. When сї 

conducting examinar; chools become the agenc 

же: 5 nation апа teachers the agent of this Y for 
: : Y Of students’ evaluation cann es 


teachers who t b ч 
5 each would ot be challenged. The 

А А . ASSESS the решена oi en 
Wards, Students would not devise E s» add ИКС te pot aside 


464 Handbook of Pupil Evaluation 


bringing quality control in the teaching-learning process. 

(f) develop better insightinto the dynamic role of educational 
evaluation for improving pupils’ learning as well as the 
teaching-learning strategies. 

(g) identify the reason and issues related to delinking of 
degrees in certain jobs. 

(h) appreciate the need for judging pupils' performance, curri- 
culum effectiveness, teachers' accountability and programme 
evaluation in the context of total school of evaluation vis- 
a-vis the learning milieu and social setting. 

(i) understand the role of National Testing service in compar- 
ing and setting standards. 


2. Evaluation Process and Examination Reform 


Under article 8.23 of the N.E.P. “assessment of performance is 
Tegarded as an integral part of any process of learning and 
teaching. Apart of sound educational strategy, examinations 


should be employed to bring about qualitative improvements .in 
education.” 


This recommendation envisages the importance of examina- 
tion reform as а means to improve teaching-learning strategies 
as well as the improvement of the performance standards by 
bringing quality control in the whole process of education. 
This requires relating examination reform to: 


(a) curriculum development process. 

(b) teaching-learning strategies. 

(c) development of instructional and learning material. 

(d) mechanics of developing instruments of evaluation, conduct 
of examinations etc. 

(e) learning milieu of the institution. 


From the above mentioned relationships we may observe 
that a close and inter-dependent relationship exists among pupil 
evaluation, curriculum evaluation and institutional evaluation, 
focus of each being on improvement of students’ learning, 
teaching-learning strategies and learning environments. " 

Article 8.24 states, that, “Objective will be to recast the 


Evaluation in National Policy on Education —1987 465 


examination system so as to ensure a method of assessment is 
valid and reliable measure of student development that and a 
powerful instrument for improving teaching and learning. In 
functional terms, this would mean according to N.E.P.: 


(a) the elimination of excessive element of chance and sub- 
jectivity. 

(b) the de-emphasis of memorisation. 

(c) continuous and comprehensive evaluation that incorporates 
both scholastic and non-scholastic aspects of education, 
spread over the total span of instructional time. 

(d) effective use of evaluation process by teachers, students 
and parents. 

(e) improvement in the conduct of examinations. 

(f) the introduction of concomitant changes in instructional 
materials and methodology. 

(g) introduction of the semester system from the secondary 
stage in a phased manner; and 

(h) the use of grades in place of marks. 


Under this article (8.21) two directions are indicated: 


(a) To improve the quality of measuring instruments for more 
dependable evidences. In this regard focus is on: 


(i) maximising the objectivity in assessment. 

(ii) de-emphasising the role of memorisation in construc- 
tion of instruments of evaluation. 

(iii) use of grades in place of marks. 


(b) To improve examination system in such a manner that 
they become useful devices for bringing improvements in 


the teaching-learning system. For this focus is on: 


(a) phased introduction of semester system at secondary 
stage, thereby, ensuring regular study; better feedback, 
more intensive coverage of syllabus for examination 
and apportioning curriculum load in four semesters. 

(b) incorporating non-scholastic areas of pupils’ develop- 
ment for more comprehensive assessment. 


466 Handbook of Pupil Evaluation 


(c) continuous appraisal, diagnosis, remediation and feed- 
back to in order to improve students’ achievement and 
adjusting or adapting teaching learning strategies. 

(d) diagnostic and formative function rather than on 
judgemental function of assessment, thereby, providing 
feedback to teachers, students and parents about 
pupils’ adequacies and inadequacies in learning. 

(е) developing mechanics of examinations scientifically to 
make examination marks, vis-a-vis results, more simple, 
meaningful and comparable. 

(f) dove-tailing teaching-learning practices, instructional 
and learning materials, curriculum transactions and 
other collateral changes in the system by using evalua- 
tion as quality control device and service component 
of the educational process. 


_ According to article 8.25 “the above goals are relevant both for 
external examination and evaluation within educational institu- 
tions. Evaluation at the institutional level will be streamlined 
and predominance of external examination reduced". 

This article focuses our attention on: 


(a) close relationship between external examination and school 
evaluation. | 

(b; developing a sound system of continuous comprehensive 
evaluation in schools to make teachers' assessments more 
valid and reliable. 

(c) the need for making teachers responsible and accountable 
for developing cognitive as well as non-cognitive faculties 
of pupils’ growth. 

(d) reducing the dominating influence of external examinations 
on class room practices and in certifying pupils’ achieve- 
ment for promotions, failures etc. 


3. Delinking Degrees from Jobs 


Articles 5.38 to 5.41 point that “beginning will be made 11 
delinking degrees from jobs in selected areas. The proposal 
cannot be applied to occupation-specific courses like Engineer- 
ing, Medicine, Law, Teaching etc. Services of specialists with 


Evaluation in National Policy on Education—1987 467 


academic qualifications in humanities, Social Sciences, Science 
etc. will continue to be required in various job positions”. 

Delinking will be applied in services for which a University 
degree need not be a necessary qualification. Its implementation 
will lead to a refashioning of job specific courses and afford 
greater justice to those candidates, who despite being equipped 
for a given job, are unable to get it because of an unnecessary 
preference for graduate candidates". 

With respect to delinking of degrees the focus is on the need 


for: 


(a) considering the candidates having needed ‘skills or job 
training as qualified for appointment as apprentices for 
certain jobs where university degree is not needed for 
selection. 

(b) re-designing all such courses which are job spécific to 
make such candidates eligible without requiring a university 
degree. 

(c) continuing the degree requirements in case of occupation 
specific courses like medicine, engineering, law, teaching 


etc. 
(4) devising quality control mechanism in the form of nation- 


wide testing programme on voluntary basis. 
4. Programme of Actions (P.O.A.) 


Corresponding to N.E.P. the following measures are suggested 
in the P.O.A. in Chapter XVIII relating to evaluation process 


and examination reform. 


4.1. The Policy and Strategies for Implementation 


The policy visualises integration of the assessment of perform- 
ance with the process of learning and teaching, and utilising the 
process of evaluation to bring about qualitative improvement 
in education (para 8.23). In order to ensure that the method 
of assessment of students’ performance is valid and reliable, the 
following short-term measures are proposed: 


468 Handbook of Pupil Evaluation 


(a) At the School Level: 

(i) Public examinations will continue to be held only at 
the levels of Classes X and XII. 

Gi) Decentralisation of the operation involved in the con- 
duct of examinations to make the system work more 
effective. 

(iii) School Boards in certain States have set up a number 
of sub-centres to decentralise the conduct of examina- 


tions. Adoption of similar measures by other State 
will be pursued. 


(iv) In the event of decentralisation as indicated above, the 
State Boards of School Education would continue to 
get the question papers set and printed, consolidate 
the results of examinations and also undertake test 
checks on random basis of the functioning of the sub- 
centres; and 

(v) Spot evaluation of answer scripts. 


(b) At the University Level: 

(i) Continuous institutional evaluation will be introduced 
at the post-graduate level, to begin with, in Unitary 
Universities, Demand Universities and Autonomous 
Colleges; . 

(ii) Students’ performance will be indicated through letter 
grades, and assessment of overall performance will be 
on the basis of cumulative grade point average; 

(iii) Provision will be made for improvement performances 
through subsequent appearances without involving any 
disadvantage to the candidate; — 

(iv) External examinations will continue to be held by 
universities which have a large number of affiliated 
colleges and efforts will be made to improve the Som 
duct of examinations through effective decentralisation 
as indicated for school level examinations; | 

(v) Modifications in the qualifying recruitments for admis- 
sions in the universities and colleges will be examined 
to accelerate the process of change in the school level 
examinations. 


Evaluation in National Policy on Education —1987 469 


(c) Conduct of Examinations: 


(i) 


Gi) 


(iii) 


The possibility of introducing legislation to define 
various malpractices connected with examinations and 
to treat them as cognizable and unbailable offences 
will be considered; 

Such laws will also, when enacted, make provision to 
prescribe the nature and type of punishments for 
various offences under the law, and to include within 
its scope persons engaged in various operations con- 
nected with examinations and also to provide protec- 
tion of them; and 

Innovations and experiments in the conduct of exami- 
nations, like printing and distribution of question 
papers with questions arranged in different sequences 
to avoid copying and other unfair means in the exami- 
nation halls. 


4.2. Integrating Evaluation with Teaching and Learning 


In order to attain the objective of integrating the process 
of evaluation with teaching and learning, several long-term 
reforms will be necessary. For this purpose, the following pro- 
grammes would be considered: 


(a) At the School Level: 


() 
(ii) 


(iii) 


The Boards of Education will lay down the levels of 
attainment expected at classes V, VIII, X and XII; 

The Boards will also prescribe the learning objectives 
corresponding to these levels of attainment in terms of 
knowledge and comprehension, communication, skills 
in the application of knowledge, and the ability to 
learn; 

Schemes of evaluation consisting of examinations to 
test those aspects of learning which can be assessed 
through formal examinations, and the proceduré for 
assessing those aspects which cannot be tested through 
such an examination, will be developed. Abilities and 
proficiencies which can and should be assessed through 
institutional evaluation will be identified and proce- 
dures evolved for such evaluation; 


~ 


470 | Handbook of Pupil Evaluation 


(iv) The development of schemes of evaluation is а con- 
tinuing process. To provide professional support to 
this process, the Boards of Education will consider 
setting up а Consortium for initiating research and 
development in evaluation procedures and in the con- 
duct of examinations; 

(v) For performing this task, the Consortium will adopt 
selected schools as pilot centres and will hold examina- 


tions and award certificates for the students of such 
Schools; 


(vi) Before question papers are set, a detailed design will 
be evolved indicating the weightage to be given to 
various areas of content, types of questions and the 
objectives of teaching/learning; 

(vii) Along with external examinations, continuous institu- 
tional evaluation of scholastic and  non-scholastic 
aspects of education will be introduced; 

(viii) Evaluation of students’ performance will move towards 
cumulative grading system; 

(ix) In the big States, the possibility of establishing more 
than one Board of Education will be considered, so 
that the number of students to be examined by one 
Board does not exceed опе lakh; and 

(x) Procedures will be developed for the appointment of 

Chairmen/Secretaries of Boards of Education and 
Controllers of Examinations to inspire confidence 
among public. 


(b) At the University Level: 

() The possibility of developing alternate systemi 
evaluation in place of external examinations for affilia 
ted colleges will be explored; $ айр 

(ii) The question of some universities functioning only as 
examining bodies for а number of colleges will be 
examined; | 

(iii) Academic reforms visualised in the policy like flexi- 
bility in the combination of courses, modular structure, 
provision for accumulation of credits, redesigning of 
courses, etc. will lead to considerable decentralisation 


— a- 


Evaluation in National! Policy on Education—1987 471 


4.3. 


(a) 


(b) 


(c) 
(d) 
(e) 
(г) 


(в) 
(h) 


in the evaluation process. Detailed schemes will be 
evolved to facilitate transition to new evaluation pro- 
cedures concurrently with the changes in the content 
and structure; and 

(iv) An agency w ll be developed either as part of the AIU, 
or iadependently, for continuous research and develop- 
ment in evaluation procedure. 


General Recommendations 


Integrity of the examiner is crucial to the credibility of the 
examination system. 


This credibility can be established by the openness of the 
examinations. It has to be recognised that students have 
the inalienable right to scrutinise their answer scripts and 
its evaluation and also compare them with those of others; 
The practice of declaring results in terms of over-all divi- 
sions and pass/fail may te reviewed and substituted by a 
system of declaration of results in terms of marks/grades in 
each subject separately; А 
Candidates should have the opportunity to improve upon 
their grades through subsequent attempts; 

Provisions should be made for clearing examinations in 
parts, in conformity with the modular pattern of courses; 
The practice of scaling marks of different subjects which 
are not at par may be adopted in determining the grades; 
Intensive training programmes will be organised for paper 
setters; 

Question banks will be developed to assist paper setters; 

A detailed marking scheme will be developed to ensure 
objectivity in scoring answer scripts; 


(i) Innovative ideas like open book examination, diagnostic 


(i) 


evaluation etc. may be experimented with; 
Separate certificates will be awarded showing the results of 
institutional evaluation and external examinations; 


(k) The certificate of institutional evaluation may cover acade- 


mic achievements as well as non-scholastic aspects. 


(1) Attempts will be made to move towards a situation in 


which only those who teach will cvaluate their students; 


472 Handbook of Pupil Evaluation 


(m) Integration of evaluation with the process of teaching and 
learning will help diagnose the weaknesses and. deficiencies 
in education. This diagnostic aspect will be utilised to 
develop remedial programme for weaker sections. 

(n) Facilities will be provided in Schools and colleges for main- 
tenance of students’ records to facilitate continuous institu- 
tional evaluation; and 

(о) Programmes of training and orientation of teachers will 
give special attention to new evaluation methodologies, 
setting of question Papers, measurement of performances, 
etc. 


5. Emergent Emphases 


(a) Teacher assessments 

If one tries to analyse the philosophy underlying the New 
Educational Policy vis-a-vis Programme of Action there seems 
to be the growing faith in teacher on whom ultimately depends 
the quality of instruction vis-a-vis students? learning. More and 
more faith is being reposed on the class-room teacher whose 
assessment of his pupils are to be regarded more dependable 
than an external agent certifying their achievement. Teachers 
alone can illuminate the strength and weakness of their wards 
at regular intervals, being in constant touch with them. This 
provides the needed basis and facilitates remedial enrichment 
programme. 


(b) Diagnostic evaluation 

Another belief reflected in these recommendations is that 
evaluation should ђе“ used more asa teaching and diagnostic 
device than as a judgemental device. This it is believed. would 
Serve better the quality control function of evaluation. Thrust 


3 та ; 
of reform has to be towards improvement of students’ learnin 
instead of merely measurement of their learning. 

—_— MÀ À се nic, 


(c) Mechanics of testing 

Need for training Of paper setters and other functionaries re- 
iterates the belief that technology of paper setting is well 
developed and demands trained paper setters, evaluators and 


et 


p 


of Complete faith 


Evaluation in National Policy on Education—1987 473 


examiners. Like-wise decentralisation of the external examina- 
tion for improving efficient administration. fair conduct of 
examination, and other measures recommended clearly indicate 
the need for improving mechanics of examinations. 


(d) Integrating evaluation 

The most significant indication is the stress on regarding evalua- 
tion as an integral part of the teaching learning process. Ecology 
of evaluation is highlighted, of course not to the desired extent, 
in the context of total learning milieu and social setting. How 
evaluation can ђе used as a feedback device to develop a 


dynamic equilibrium is reflected, though not in proper perspec- 
tive, 


(e) De-emphasising external examinations 

We have been made to believe that the new education Policy 
Tecognises teacher as Teal agent of examination reform and 
external public examinations must be replaced by institutional 
or teacher's evaluation, However, bold decision on doing away 
with external examination has yet not been taken. It is only de- 
emphasis of these examinations, which is recommended, Perhaps 
the administrators are afraid of maintaining Standards at the 
hands of internal classroom teachers. Most probably they think 
of first establishing a well developed system of continuous com- 


prehensive evaluation in schools before discontinuation of 
ternal examination. 


Nevertheless, 
for admission toi 
ment, is 


ех- 


increasing demand and 
nstitutes of higher le 
a sufficient evide 


use of selection tests 


arning and for employ- 
nce to do away 


(d with existing public 
examinations at the School Stage to pave the Way for teachers’ 
evaluation. How soon the 


two incompatible philosophies, One 
aluation and other of having 
tions for certifying Students 
in {һе form of teacher based 
asures to Maintain standards, 
Omplete faith in our teachers, 


CHAPTER XIX 


EMERGENT DEMANDS ON TODAY'S 
EVALUATOR 


1. Introduction 


In the previous chapters an attempt has been madc to acquain: 
the readers with the concept and process of educational 
evaluation as wellas the tools and techniques of evaluation. 
Discussion was limited to the evaluation of cognitive learning 
` outcomes, interpretation of test scores, use of test-results and 
the changing emphases in the field of cvaluation. The focus was 
the written examinations and their improvement at the school 
and public examination level. The clientele in view was the 
class teacher, the teacher educator and the inservice cducator. 
Notwithstanding, the other aspects of evaluation and areas of 
evaluation which are emerging in this field, need special treat- 
ment. In fact during the seventies we find a clear drift away 
from measurement value to that of the pedagogical value of 
examinations. The reasons for this are quite a few. Some are 
educational, others are social while still others are а 
tive in nature. We shall try to identify special areas of ишш 
that need the attention of educators in general and of evaluators 
in particular. Three types of directions can be visualised for 
exploring emerging emphases. These may be based on the needs 
of learners, requirements of teachers and demands of the 
administrators. 
After going through this chapter the reader is expected to 


(a) identify the emerging demands related to the learner 


Emergent Demands on Today's Evaluator 475 


e.g. self evaluation quizes, criterion referenced measures 
diagnosis and remediation, evaluation of the gifted etc. 

(b) infer the teacher based demands in the context of affective 
and psychomotor outcomes evaluation as a teaching device, 
descriptive evaluation, question bank and institutional 
evaluation, etc. 

(c) predict the administrator related demands like those of 
internal assessment, marking scaling and grading, 
curriculum and programme evaluation, open book 
examination etc. 

(d) appreciate the special role of eventrator in tomorrow's 
school. 


2. Emerging Demands 
2.1. Learner Based Demands 


(a) Quizes for self-evaluation 

Various theories of learning stress the need for the development 
of positive self-concept so that the learner is well motivated for 
the tasks he is supposed to undertake. The evaluation system, 
therefore, must provide him rewarding experience in the form of 
success in a test. Feedback of failings should indeed be 
completely eschewed. What is required is the knowledge he 
should have about the adequacies and inadequacies in his 
learning so that he may make positive efforts to improve 
through self-evaluation. His performance need not be judged in 
terms of set norms or deviation from the normal. He should 
not be labelled as under achiever, a failure or a deviant. What 
is needed is his own introspection of his successes and failures. 
For this he may prefer self-instructional programmed 
material (1) like quiz cards which he can use as and when 
he likes to discover what he knows and what he does not. 
He mightlike to изе review exercises given at the end of a 
chapter in a text-book. Class room questions by teachers and 
students’ responses can also give some idea about ones’ under- 
standing of the subject. Other modes of self-evaluation can 
also be visualised to enable and guide the pupils to undertake 
self-analysis. It is therefore, suggested that use of Quizes may 
be taken advantage of as a self-evaluation technique. 


476 Handbook of Pupil Evaluation 


(b) Criterion-referenced measures 

The traditional evaluation system stresses the need for contain- 
ing students’ performance in terms of group performance i.e. 
class or school or a state norm. A learners’ performance is 
judged against the performance of his peers or class mates. How 
good or poor his performance is, can be measured in terms of 
his deviation from the norm. This approach has lead to an 
erroneous belief that every learner can learn at the same rate 
and through same strategy. Like-wise it is assumed that every 


student has attained the intended objective. But this is not 
correct. The reality is that students with 50%, 40% and 337; 
marks have mastered the subject partially. Therefore, evalua- 
tion must be done in terms of criteria which may be stated in 
terms of intended instructional outcomes, or a performance 


t ca risolta. 
standard regarded as expected learning. Moreover, the accept- 
able level in terms of mastery learning cannot be less than 70% 


to 80% mastery of the concepts. In other words, what is desired 
is mastery of the concepts beyond 70% by at least 70-75% of 
‘students. For this criterion referenced measures are needed as 
-also the criterion referenced teaching model. (2) Thus emphasis 
on mastery learning and criterion referenced testing must be the 


focus’ of good teaching and testing. More discussion on this 


demands attention in a full-fledged chapter. 


(c) Diagnostic testing and remediation 

Diagnostic testing refers to the discovering of pupils’ weaknesses 
for which there may be educational or non-educational causes. 
Among non-educational causes pupils’ family background, 
physical defects, social adjustment problems etc. may be 
responsible for the poor performance. Among the duced 
causes there may be various reasons like poor "E n. 
background, non-mastery of basic concepts, di pigs : 
ded for learning, defective instructional strategies etc. 
well known fact that when weak students are promoted bes 
33% or 40% marks they are ultimately unable to get = 
from instruction and go on becoming weaker and weaker ti 
they stagnate and drop out of the system. Unless proper 
diagnosis is undertaken and the reasons for their inadequacies 
in learning are identified, it becomes very difficult to provide 


Emergent Demands on Today's Evaluator 477 


any remedial programme. Moreover, it is undemocratic to drag 
all the poor students along with the brighter ones in the 
instructional programmes and then declaring them as failures or 
under achievers without giving them an opportunity to improve. 
Care of the weaker section of students, especially at the elemen-- 
tary stage, is not only desirable but imperative in order to pro- 
vide the needed remedial instruction and bring them up to the 
optimum level. (3) 

This field of diagnosis and remediation has been almost neg- 
lected by most of our teachers. Therefore, in a heterogenous 
group as we have in most of the Ind.an schools, it is essential 
that special efforts be made to diagnose students’ difficulties and 
then apply the needed correctives. Very few diagnostic tests 
have been developed and that too only in the field of Reading 
and Mathematics. Even at the doctorate level, we find very 
few studies in this field. A detailed discussion on the need and 
techniques of diagnostic testing and remediation is, therefore, a 
cogent demand of the learners to be met by teachers. 


(d) Evaluation for search of talent 
Inspite of the fact that a lot of work has been done to identify 
talented students, the fact remains that except for the national 
effort made at the N.C.E.R.T. level there is hardly any work being. 
undertaken at the level of the states. West Bengalis of course 
already conducting a test for the award of scholarship for their 
students under the Jagdish Bose National Talent Search Scheme. 
Similar attempts need to be made in all the states in order to 
discover talented students who could be nurtured properly so 
that they may contribute to national development. A decentra- 
lised scheme of national talent search under the National 
Council of Educational Research and Training has been exten- 
ded to the States from the year (1985-£6) and for the first time a. 
large scale effort at the State level was made by the respective 
States to identify their talented students through state level test- 
ing апа for recommending candidates to the N.C.ER.T. stage 
testing. (4) Scheme of Navodaya Vidyalayas which envisages the 
E T pens vi district of India by the end of 
, ter landmark in the search for rural 


talent at the ele ary stan ; : а 
at the elementary stage and nurturing (ken iy hy 


NO 


Ж the evaluators make a deliberate attempt to 


478 j Handbook of Pupil Evaluation 


Navodaya Vidyalayas. This scheme is sponsored by Navodaya 
Vidyalaya Samiti under the Ministry of Human Resource 


Development, Government of India. 
There are various schemes and scholarships which are 


awarded in various countries and the criteria of selection varies 
from the very subjective estimates like the teacher's assessment 
to that of very objective measureslike those of a standardised 
tests. Nevertheless, at the school level what special measures 
are taken by the teachers in general to identify such students is 
yet to be seen. It would perhaps be appreciated if teachers are 
acquainted with the various types of assessment procedures for 
identifying such students and at the same time are provided 
guidance as to how enrichment. programmes for such students 
may be given. It would, therefore, be in the national interest if 
identify such 
students through well designed mental ability tests, aptitude 
tests and creativity tests to spot out talent. This area of 
assessment of creative abilities, therefore, needs to be discussed 


in more detail. 


2.2. Teacher-based Demands 


A Jot of literature is available on evaluation of cognitive 
outcomes of learning. However, assessment of affective and 
psychomotor outcomes has not yet received the attention needed. 
It is yet to be realised by teachers that evaluation can be used 
for better learning and also for improvement of the teaching- 


learning process. Teachers have to appreciate that assessment 
of inputs and the 


of students’ learning depends on types 5 
processes that are used and, therefore, EOM erui 
gramme must be judged in terms of both the produc the 
process of learning. How examinations can be managed at the 

an for evaluating 


school level needs a well-thought out pl 


students in schools? Likewise, the need for a good question 
bank cannot be over-emphasised. Keeping in view the teachers 
limitation we тау, therefore, like to focus the attention of the 


teachers to such areas. 


Emergent Demands on Today's Evaluator 479 
(a) Assessment of affective outcomes 

Affective outcomes of learning refer to the development of such 
non-cognitive behaviours which are mostly developed uninten- 
tionally, like appreciation, habits, interests, attitudes, values etc, 
Such outcomes of learning are developed incidentally, informally 
and non-formally and it is seldom that teachers make special 
effort for development of such learning outcomes, Because of 


their intangible nature, teachers have also not realised 


the 
necessity of assessing such 


outcomes. That is why even 
researches in the area of assessment on affective outcomes are 
not very many. It is only the highly technical experts who have 
devised instruments like attitude scales, interest inventories ete, 
that are usable only by experts. More functional and usable 
techniques in the class room need to be stressed. Techniques 
like Observation, inquiry and analysis can be used. Taxonomies 
of educational objectives by Krathwhol (5) Hannah (6) in the 
area of affective domain are available but not much empirical 
evidence is available which can be used as a framework. There- 
fore, there is a need to explore the possibility of making assess- 
ment of affective outcomes of learning more objective with 
Which teachers must be acquainted. 


work in the field of psycho- 
ken so Seriously. There is 


t -D. level. However, quite a 
few taxonomies of psychomotor domain have been advocated 


by experts like Simpson, Harrow, Alles, Dave and Hannah. 
Inspite of these taxonomies 


empirical evidence available o. 
Outcomes. Even te 


ally useful 
] : у al education, home Science, etc. need the 
immediate attention of TOVide the 
1 ing in areas 
The development of 
огап s ne needed objecti 
reliability in marking is re i е, 


ity and 
lenging t 


&Sk for the 


480 Handbook of Pupil Evaluation 


evaluators. Author himself has done some work in this field 
and it is reflected in a monograph (7) published by N.C.E.R.T. 
It hardly needs any testimony that in the area of practical 
work in all such subjects subjectivity of marking reigns 
supreme. There are a number of issues involved especially 
relating to the conduct of examination in these subjects and the 
quality of tools and techniques which are in use. It is, therefore, 
desirable that today's teacher must get an insight into the 
various problems and possibilities in the assessment of practi- 
cal work in various subjects. 


(c) Evaluation for Better Teaching and Learning 
Almost every teacher knows that evaluation is a process of 
collecting evidences about pupils’ learning and as such it 
concerns the various types of data gathering devices. The 
emphasis, therefore, is on the improvement of the reliability and 
validity of such evaluation instruments which are used for 
judging students' performance. The role of evaluation in making 
judgement is well known. These judgements are normally made 
in terms of group performance and are used for certifying 
students’ achievement on the basis of which they are sometimes 
classified into various categories. Evaluation as а teaching 
‘device or as an experience for better learning is yet to be 
realised by the practising teachers. In fact, very few teachers 
are cognizant of the fact that- evaluation can be used for 
improving students’ learning besides measuring their achieve- 
ment. How evaluation at the pre-instructional stage can help to 
ascertain the entry behaviour of students is not well known. 
The need for determining adequacies or inadequacies in the 
entry behaviour of the learner is essential for placement of 
students in the learning continuum. Therefore, evaluation a Br 
pre-instructional stage helps the teachers to provide the basi 
for remediation before taking up developmental teaching. 
Likewise, how many teachers are aware of the fact that 
evaluation in the form of class room questioning can go a long 


way in developing those very abilities which are tested through 
those very quetions? Well-structured and thought provoking 
" develop in 


Questions used in the class help, in due course, to 6 
Students, the abilities of interpretation, analysis, hypothesis, 


Emergent Demands on Today's Evaluator 481 


inference etc. (8) Therefore, the use of test material in teaching 
must be highlighted in any evaluation system. At the post 
instructional stage, when questions or a unit test is used, it 
provides data, which would help the teacher to know about 
adequacies and inadequacies in his teaching strategies. As 
emphasised in one of the previous chapters on objective based 
testing, the use of various levels of questions can be used to test 
various objectives which when used in class room teaching 
become the learning experience for the pupils who ultimately 
tend to put similar questions. Therefore, evaluation as a 
teaching device should also be realised by our teachers in order 
to improve their questioning ability. This aspect of evaluation 
also demands further elaboration. 


(d) Towards Descriptive Evaluation 

The most commonly used approach to testing is the psycho- 
metric approach which by and large aims at measurement of the 
product of learning. This approach is a statistical approach 
that attempts to measure students' learning in terms of gaps or 
congruence between the intended and the actual outcomes. Its 
focus is on precision and prediction. However, it totally ignores 
the social milieu in which learning takes place. The product of 
learning in fact, is the outcome of context, inputs and the 
process of teaching and learning. Every pupil learns within the 
given social setting and the facility he gets in terms of time, 
facilities available and the quality of instruction he receives. 
Thus, any product of learning or the impact depends on all 
these aspects. Unless evaluation takes into consideration all 
such variables and factors that throw light on pupils’ achieve- 
ment it will not be possible to pass any judgement about his 
product of learning. This sort of evaluation which emphasises 
more description and interpretation. of data and explains 
the outputs of learning in terms of a given context, inputs and 
Processes is termed as an illuminative evaluation. (9) This 
approach to evaluation needs the attention of the teachers and 
evaluators and requires Separate treatment. 


(e) Management of Evaluation in Schools 
A teacher may be well acquainted with techniques of construc- 


482 Handbook of Pupil Evaluation 


tion of various evaluation instruments, yet he may not be able 
to manage the evaluation programme properly due to the lack 
of insight in the relationship of various processes and the time 
targeting or scheduling of various activities. What type of tests 
should be used, at what time and for what purpose needs good 
planning. How are the test results to be recorded and inter- 
preted for proper comparison of students’ performance? How 
Should judgements be formed and of what kind? What types of 
decisions may be taken regarding promotion, classification of 
Students, diagnosis and remedial programmes? How should 
results be reported and communicated to the consumer? How 
much time should be devoted to testing in relation to instruc- 
tion? How can students’ performance in various schools or of 
various sections within the same school be compared? All this 
requires not only a broad conceptual framework of ап evalua- 
tion system but also a systematic planning and scheduling of 
activities related to evaluation at various points of time for 
judging students’ learning in various subjects. Management of 
school examinations is, therefore, an area that needs more 
work. (10) 


(f) Teachers’ Question Bank 

Despite continuous organisation of various workshops for 
training of teachers in item writing, the fact remains that most 
teachers are not able to develop good questions, testing diffe- 
rent abilities. Moreover, they do not have sufficient time and 
expertise to develop a good question bank in their subjects. It is 
true for most of teachers even in developed countries and that is 
why organisations like Educational Testing Services, and the 
National Foundation of Educational Research have taken up the 
responsibility of preparing quality materials in various nod 
If a good question bank is available with the teacher, he wou 
be in a much better position to make use of that material and 
analyse his pupils’ performance with a view to providing them 
better guidance for improvement of their learning. What type 
of question bank can a teacher prepare? What could be the 
strategies for preparing such a bank? How can coding of items 
be done? How can the question bank be used effectively? These 
are some of the issues which need to be discussed in more 


Emergent Demands on Today's Evaluator 483 


detail. In fact, a framework for a teacher's question bank and 
its blue printing needs to be developed before taking up the 
process of item writing. A try-out of the items is essential to 
know the item parameters. It is, therefore, desirable that 
detailed scheme of question bank may be worked out at the 
institutional level and orientation for use of this material may 
be undertaken for the benefit of teachers. Need for educational 
testing service in this context cannot be over emphasised as 
reflected in N.C.E.R.T. (1985) National Curriculum Frame- 
work. (11) 


2.3. Administrators-based Demands 


It is quite obvious that a number of malpractices have crept in 
most of our examinations particularly in the public examina- 
tions conducted by various boards of school education. It is, 
therefore, necessary that some alternative measures may be 
thought of like those of the open-book examination. Similarly 
maintaining of standards has become almost difficult because of 
great diversity in the curriculum standard. What is needed is 
not only to ensure uniformity of standards all over India but 
also ways and means to make the scores comparable by diffe- 
rent statistical measures like scaling. For quality control, it is 
necessary that more and more emphasis may be laid on 
evaluation. At the same time, an attempt may be made to 
evaluate the various curricula for renewal purpose. Similarly, 
programme evaluation must find a place in the total system of 
evaluation so that one should be able to see the congruence or the 
discrepancy between the intended and the observed results. A 
new programme of reform needs to be implemented in a phased 
„лаппег by different agencies, like N.C.E.R.T, S.C.E.R.Ts, State 
Evaluation Units and State Institutes of Education that can play 
a significant role in the implementation of improvement progra- 
mmes. Besides, research in examination must be an on-going 
process to see the effectiveness of the various evaluation 
devices as well as strategies of evaluation. 


(a) Internal Assessment and Continuous, Comprehensive Evaluation 
The need for internal assessment in the schools system has been 


484 Handbook of Pupil Evaluation 


widely appreciated by all educationists. However, administrators 
have quite a few reservations regarding its effective implementa- 
tion in schools as well as at the public examination level. There 
was a lot of supportive evidence of internal assessment progra- 
mme during the 60s and 70s but for various reasons the role of 
internal assessment 2s a reliable method of judging pupils" 
performance in growth has been questioned. Even studies 
on internal assessment regarding attitudes of teachers are 
not favourable. At the public examination, internal assess- 
ment when shown separately in the certificate has created 
Some problems for the teachers. When internal assessment 
Scores are combined with the external assessment it leads to 
Some academic problems of interpretation. Non-cognitive 
areas like attitudes, interests, habits, cocurricular activities etc. 
which have an important place in internal assessment do not find 
place in the public examinations, despite their place in the 
scheme of internal assessment. As such, the sanctity of internal 
assessment is sacrificed at the altar of practicability. 

Should there be internal assessment at all stages of school 
education? Should internal assessment be based on only cogni- 
tive areas of development? Should marks on internal assessment 
be combined with those of external assessment? What should be 
the criteria of internal assessment? How can subjectivity in 
internal assessment be minimised? Do the teachers have a 
positive attitude towards internal assessment? These are some 
of the many more issues related to the scheme of internal 
assessment. Even the experience of NCERT working with States 
like Rajasthan (12) and Tamil Nadu (13) for a Comprehensive 
Scheme of Internal Assessment has not yielded the desired 
results. It has created more problems than it has solved. There 
are studies which clearly indicate the extreme subjectivity of 
teachers in assessing their students internally. There are 
instances of teachers even at the college level who would not 
like to undertake internal assessment for reasons which are 
more administrative than academic. Therefore it is necessary 
that some sort of survey of internal assessment practices be 
undertaken and problems identified, specially those relating to 
subjectivity in marking. How internal assessment can be 
made a practical preposition to provide more valid and reliable 


Emergent Demands on Today's Evaluator 485 


evidence about pupils’ growth, is a challenging task for the 
evaluators. Therefore, there is need for an incisive discussion on 
this major issue related to continuous, comprehensive evalua- 


tion. 


(b) Open Book Examinations 

It is an open secret that because of the various ills of the 
existing agencies and the malpractices with which these exami- 
nations are associated alternative measures have to be found 
out. A lot of debate has already gone into the experiment of 
open book examinations (14). Does the open book examination 
envisage the availability to students of all sorts of reference 
material or only a text book? Can the open book examination 
be used for testing all types of instructional outcomes? Can the 
usual type of questions be used or does it demand different 
expertise? Would it increase subjectivity in marking, because of 
the openness of questions? What would be the position of 
objective type questions in this system? Will the results of 
students on such an examination be comparable to that of 
those who take the existing examination? These and many 
more such questions need to be identified and discussed. To 
examine the feasibility of this system different tools and 
techniques may be explored and a discussion of their validity, 
reliability, and practicability will have to be initiated. 


(c) Marking, Scaling and Grading 

These three concepts are very much related and need to be 
understood properly in order to compare, maintain and set 
standards. What are the various marking systems? What are 
the various methods of grading? What are the various techni- 
ques of scaling? Is grading more reliable than numerical 
marking? Can scaling be justified? These are some of the many 
problems associated with the grading and scaling of results. 
These problems involve a lot of statistical handling and are not 
within the reach of the ordinary teacher. Nevertheless, what is 
needed is whether they are able to make the results or the scores 
more meaningful. Other statistics can be worked out even with 
the help of calculations under the guidance of experts. Since 
there is a lot of variation in the standards of question paper in 


486 Handbook of Pupil Evaluation 


various subjects it is necessary that calibration of marks be 
undertaken to make the results more comparable. This area. 
also needs more technical discussion (15). 


(d) Curriculum and Programme Evaluation 

It hardly needs any testimony that the quality of curriculum 

determines the standard of education. What are the curriculum 

components and how can each ofthem be cvaluated? Objec- 

tives, content, methodology and outcomes of learning are 

basically the four major components which need to be evaluated 

for the renewal of curricula. Each of these components can be 
evaluated at various stages of curriculum development like the 
planning, development, implementation and quality control 
stages. For each of the components we will be able to use 
reflective evaluation, formative evaluation and summative 
evaluation at various stages of the curriculum development 
process. Likewise, when any programme is to be evaluated we 
have to evaluate its context, inputs, processes, products and the 
impact. Accordingly we have to resort to context evaluation, 
input evaluation, process evaluation, product evaluation and 
impact evaluation. (16) In fact, pupil evaluation, curriculum 
evaluation and programme evaluation are to be integrated as 
pupils’ evaluation is sub-set of curriculum evaluation which in 
turn is a sub-set of programme evaluation. An evaluator must 
get a thorough acquaintance with this sort of integrated: 


evaluation approach. 


(e) Research in Evaluation 
Research deals with finding out solutions for problems by 


providing evidences on the effectiveness of the various strategies: 
involved. It not only provides evidences but it also highlights a 
number of inadequacies and adequacies in students’ learning as. 
well as in the teaching-learning strategies. It also provides data. 
on the basis of which new hypotheses are formulated. Research 
on examination may relate to the validity and reliability of the 
instruments. It may relate to the mechanics of examination, 
economics of examinations or ecology of examinations. There 15 
dire need for getting an idea into the various types of researches 
which have been undertaken in this field and to identify new 


Emergent Demands on Today's Evaluator 487 


areas on which further research is needed. Likewise researches in 
curriculum evaluation, programme evaluation and institutional 
evaluation could also be undertaken to get empirical evidences 
about the efficacy of the curricula and programmes. It is, 
therefore, essential that our clientele get an insight into the 
various areas of researches that have been undertaken in the 
past and those which need to be undertaken to support or 
validate the various reform programmes. 


3. Evaluators of Tomorrow 


From the foregoing discussion, it is quite obvious that proper 
evaluation is not everybody's cup of tea. It needs a lot of 
insight into process of evaluation. It needs very systematic 
planning to identify the various issues relating to evaluation so 
that like a plaintiff, an evaluator can make out a good case. He 
is also required to defend his case when specially adverse 
judgements are made. He should be able to justify his stand in 
accordance with his terms of reference. Not only has he to 
advocate his view-point regarding the types of judgements he 
made but also why he made them. He might have to plead for 
self-evaluation in one case, norm-referenced evaluation in 
another case or criterion-referenced evaluation in still 
another case when a programme or a curriculum is to be 
evaluated. When there are conflicting views about the evalua- 
tion reports about implementation it is difficult for the adminis- 
trators, he has to play the role of an arbitrator between the two 
parties and enable them to come to a consensus. 

Lastly, he is supposed to pass clear cut judgements on the 
basis of evidences collected from various sources. He has to form 
his own judgements and take decisions accordingly. Whether a 
student should be promoted or not? Whether a student should 
be allowed to proceed to the next unit of learning? Whether a 
student should be given first class or second class? Whether a 
given curriculum is better than the other competing curriculum? 
Whether the results of my class are better than that of the other 
class? These are the types of situations where an evaluator has 
to exercise his own judgement and take decision as a judge. 


488 Handbook of Pupil Evaluation 


Therefore, the job of an evaluator as a plaintiff, as a respondent, 
as a pleader, as an arbitrator and as a judge, must be appreciat- 
ed by the evaluators of today. We would like to discuss 
these different roles in more details in a separate volume. 


Ultimate objective of all evaluation 
is to improve student's learning 
through self-evaluation. 


10. 


11. 


15. 


16. 


REFERENCES 


CHAPTER I 


DuBois, P.H. (1970): A History of Psychological Testing, Boston, Allyn 
and Bacon, Inc. 

Garrett, H.E. (1951): Great Experiments in Psychology, Third Edition, 
Appleton, Inc. New York, pp. 175-81. 

Russel, Charles (1930): Standard Tests, Ginn and Co., Boston, 
pp. 14-15. 

Knight, Edgar, W. (1940): Twenty Century Education, Ginn and Co., 
Boston, pp. 52-53, 62. 

Lang, Albert, R. (1930): Modern Methods in Written Examinations, 
Houghton Mifflin Co., Boston, pp. 2-3. 

Cladwell Otis, W. and Courtis, Stuart, A. (1923): Then and Now in 
Education 1845-1923, World Book Co., Yonkers, New York, Chapters 
145. 

Mann, Horace, (1845): Common School Journal, Vol. VIL, No. 19. 
Chadwick, E.B. (1864): Statistics of Educational Results, The Museum, 
A Quarterly Magazine of Education, Literature and Science, 3: 480-84; 
January. 

Ayres, Leonard, P. (1918): The Measurement of Educational Results, 
17th Year Book of National Society for Study of Education, Part-II 
Public School Publishing Co., Bloomington, III, p. 11. 

Scates, D.E. (1947): Fifty Years of Objective Measurement and 
Research in Education, Journal of Educational Research, 41: 241-64. 
Thorndike, E.L. (1904): An Introduction to the Theory of Mental and 
Social Measurement, Teachers’ College, Columbia University, New 
York. 

Stone, Cliff, W. (1908): Arithmetic Abilities and Some Factors 
Determining Them, No. 12, Teachers College, Columbia University, 
New York. 

Thorndike E.L. (1910); “Handwriting”, Teacher's College Record 11: 
83-175, March. 

McCall, W.A. (1920): A New Kind of School Examination, Journal of 
Educational Research, 1: 33-46. 

Ruch, G.M. (1924): The Improvement of the Written Examinations, 
Scott, Foresman and Co., Chicago. 

Tyler Ralph, W. (1938): The Specific Techniques of Investigation: 
Examining and Testing Acquired Knowledge, Skill and Ability, 37th 
Year Book of National Society for the Study of Education, Part IL, Public 


490 Handbook of Pupil Evaluation 


School Publishing Co., Chapter 29. 

17. Kearney, Nolan, C. (1953): Elementary Schoo! Objectives, Report of 
the Mid-Century Committee on Outcomes in Elementary Education, 
Russel Sage Foundation, New York. 

18. Smith, Eugene, R., Tyler, Ralph, W., et al. (1942); Appraising and 
Recording Student Progress Harper and Brothers, New York. 

19. Bloom, B.S. et al. (1956): Taxonomy of Educational Objectives, 
Handbook-1 Cognitive Domain, David McKay, New York. 

20. Krathwhol, D.R. et al. (1964): Taxonomy of Educational Objectives, 
Handbook-II, Affective Domain, Longman, New York. 

21. Simpson, E.J. (1964): The Classification of Educational Objectives, 
Handbook-IH, Psychomotor Domain, University of Illinois, 

22. Harrow, A.J. (1972): А Taxonomy of the Psychomotor Domain, David 
McKay, New York. 

23. Despatch, Sir Charles (1854): Wood's Despatch. 

24. Hunter, Sir, W.W. (1882): Report of Indian Education Commission 
(1882-83) Calcutta, Government Printing. 

25. Indian Universities Commission (1902): Report of Indian. Universities 
Commission. 

26. Curzon, Lord (1904): Resolution on Indian Educational Policy 
Government of India, 

27. Government of India Resolution (1913): Resolution on Indian Educa- 
tional Policy. 

28. Saddler, Michael (1919): Report of the Calcutta University Commis- 
sion (1917-19). 

29. Hartog, P. (1929): Report of the Indian Statutory Commission 
(Hartog Report). 

30. Sapru Committee (1934): Report of the Sapru Committee, Government 
of О.Р. 

31. Sargeant Report (1944): Post-War Educational Developments, Indian 
Government Printing. 

32. Radha Krishnan, S. (1950): Report of the University Education 
Commission, Government of India, New Delhi. 

33. Mudaliar, A.L. (1952.53): Report of the Secondary Education Com- 
mission Government of India. 

34. Narendra Deva Acharya (1953): Report of U.P. Committee on 
Reorganisation of Secondary Education. | . 

35. A.LC.S E. (1955): Report of the First Meeting of All India Council 
for Secondary Education, Government of India. 

36. D.E.P.S.E. (1957): Report of the First Conference of the Chairman 
and Secretaries of Boards of Secondary Education D.E.P.S.E., New 

elhi. 

37. D.E.P.S.E. (1958-61): Reports of 2nd, 3rd and 4th Conferences of 
Chairmen and Secretaries of Boards of Secondary Education 
D.E P.S.E., New Delhi. 

38. Nag Chaudhary (1969): Report of the Committee on Rcorganisation of 
N.C.E,R.T. 


References 491 


39. 


40. 


41. 


42. 


43. 


44. 


45. 


46. 


10. 


11. 


D.E.P.S.E. (1953-67): Reports of 5th, 6th, 7th and 8th Conferences of 
Chairmen and Secretaries of Boards of Secondary Education. 
D.E.P.S.E., New Delhi. 

Mudaliar, A.L. (1952-53): Report of the Secondary Education Com- 
mission, Government of India. 

Ministry of Education (1968): National Policy of Education, Ministry 
of Education, Government of India. 

Rao, V.K.R.V. (1971): Report of the Committee on Examinations 
C.A.B.E., New Delhi. 

University Grant Commission (1976): Examination Reform—A Plan 

of Action, U.G.C., Zafar Marg, New Delhi. 

N.C.E.R.T. (1975): The Curriculum for the Ten-Year School—4 Frame- 

work, N.C.E.R.T. New Delhi. 

N.C.E.R.T. (1976): Higher Education and its Vocationalisation — 

Approach Paper, N.C.E.R.T., New Delhi. 

Ministry of Human Resource Development (1986): National Policy on 

Education—Ministry of H.R.D., Department of Education, New Delhi. 


CHAPTER П 


. Lindquist, E.F. (ed.) (1951): Educational Measurement, American 


Council of Education, Washington, D.C., pp. 533-40. 

Popham, James, W. (1981): Modern Educational Measurement, Engle- 
wood Cliffs, N.J. Prentice Hall. 

TenBrink, Terry, D. (1974): Evaluation —A Practical Guide for Teachers, 
McGraw Hill Book Company, New York, p. 11. 

Gronlund, N.E. (1971): Measurement and Evaluation in Teaching, 2nd 
ed., The Macmillan Company, New York. 

Bloom, B.S., et al. (1956): Taxonomy of Educational Obiectives, 
Handbook-I, Cognitive Domain, David McKay Co., Inc., New York. 
Scriven, Michael (1967): “Тһе Methodology of Evaluation", AERA 
Monograph series on Curriculum Evaluation, No. 1, Rand McNally 
& Co., Chicago. 

Bloom, B.S., Hastings, J.T. aad Madaus, G.F. (1971): Handbook of 
Formative and Summative Evaluation of Student Learning, McGraw Hill 
Book Co., New York. 

Singh, Pritam (1985): Diagnostic Evaluation and Remediation for 
Effective Learning, N.C.E.R.T., New Delhi. 

Mager, R.F. (1962): Preparing Instructional Objectives, Palo Alto Calif. 
Hamilton, et al. (1974): Classroom Research: A Cautionary Tale, 
Research in Education, 11. 

Trow, М. (1957): Comment on “Participant Observation and Inter- 
viewing: A Comparison", Human Organisation, 16(3) pp. 33-5 (In 
McCall and Simmons, 1969). 


492 Handbook of Pupil Evaluation 


12. Hamilton, D. ct al. (1977): Beyond the Number Games, Basing Stock 
and London, Macmillan Education. 

13. Popham, W.J. (1981): Educational Evaluation—Goil Attainment Models. 

14. Tyler, Ralph, W. (1949): Basic Principles of Curriculum and Instruction. 
The University of Chicago Press, First Britain Impression. 

15. Scriven, Michael (1971): Goa! Free Evaluation, Part-I, 11-71, N.LE., 
2-A. 

16. Stenhouse, Lawrence (1975): Humanities Curriculum — Project —An 


Introduction to Curriculum Research and Development, London, 
Heinmann. 


CHAPTER III 


1. Garrett, Henry, E. (1958): Statistics in Education and Psychology, 
Allied Pacific Pvt. Ltd., Bombay, Indian Ed., p. 27. 
2. Garrett, Henry, E. (1958): p. 29. 
3. Garrett, Henry, E. (1958): p. 31. 
4. Garrett, Henry, E. (1958): p. 50. 
5. Garrett, Henry, E. (1958): p. 69. 
6. Scannel, D.P. and Tracy, D.B. (1975): Testing and Measurement in the 
Classroom, Houghton Mifflin Co., Boston, p. 190. 
7. Scannel, D.P. and Tracy, D.B. (1975): Testing and Measurement їп the 
Classroom, Houghton Mifflin Co., Boston, p. 192. 
8. Garrett, Henry, E. (1958): p. 143. 
9. Garrett, Henry, E. (1958): p. 372. 
10. TenBrink, Terry, D. (1974): Evaluation—A Practical Guide for 
Teachers, McGraw Hill Co., New York, pp. 451-55. 
ll. Diedrich, Р.В. (1964). Short-cut Statistics for Teacher Made Tests, 
Princeton: Educational Testing Services, рр. 19, 34-36. 
12. Best, John, W. and Kahn James (1986): Research in Education, 
Prentice Hall of India Pvt. Ltd., New Delhi, 3rd Edition (Indian Ed.), 
p. 337. 
13. Best, John, W. and Kahn, James, V.(1986): Research in Education, 
Prentice Hall of India Pvt. Ltd., New Delhi, 3rd Edition (Indian Ed.), 
. 338. 
14. Wolfe, J.M. (1971): A Simple Method of Calculating the Coefficient of 
Correlation, Journal of Educational Measurement,8: pp. 221-22. 
15. Scannel, D.P. and Tracy, D.B. (1975): Testing and Measurement in the 
Classroom, Houghton Mifflin Co., Boston, pp. 204-206. 
16. Spearman, C. Brown, William Brown (1910): British Journal of 
Psychology, 2: pp. 271-95, 296-322. 
17. Richardson, M.W. (1939): The Calculation of Test Reliability Coefficient 
Based on Rational Equivalence, J. Edu., Psychology 30: pp. 681-87. 
18. Saupe, J.L. (1961): Some Useful Estimates of the Kuder Richardson 
Formula Number-20 Reliability Coefficient, Educational and Psycho- 


| 
| 


References 493. 


19. 


20. 


ю 


10. 


1% 


15. 


16. 


. Singh, Pritam (1986): In *Evaluation at the Elementary Stage’, 


. Kranthwohl, D.R. et al. (1964). Тахотопу of Educationa 


logical Measurement, 2: pp. 63-72. 


Garrett, Н.Е. (1958): Statistics in Education and Psychology, Alli 
Pacific Pvt. Ltd., p. 350. 1d Psychology, Allied 


Lord, F.M. (1959): Tests of the Same Length Do Have the Same 
Standard Error of Measurement, Educational and Psychological 
Measurement, 19: pp. 233-39. 


CHAPTER IV 


. Gronlund, N.E. (1981): Measurement and Evaluation in Teaching, 


5th Ed., Macmillan Collier, Macmillan, New York, pp. 25-26. 

Furst, E.J. (1958): Constructing Evaluation Instruments, David McKay 
Co., Inc., New York. 

Kothari, D.S. (1966): Report of the Education Commission (1964-66), 
Ministry of Education, Government of India, Chapter I. 

Mager, R.F. (1962): Preparing Instructional Objectives, Belmont, Calif: 
Fearon Publishers. 

National Education Association Commission on Reorganisation of 
Secondary Education, Cardinal Principles of Sccondary Education, U.S. 
Office of Education, U.S.A. 

Ebel, R.L. (1966): Measuring Educational Achievement, Prentice Hall 
of India Pvt. Ltd. New Delhi, 1969-70 (Indian Reprint). 

Guilford, J.P. (1967): ‘Structure of Intellect’ in The Nature of Human 
Intelligence, N.T.C., McGraw Hill. 

Bloom, B.S. et al. (1956): Taxonomy of Educational Objectives, Hand- 
Book-I, Cognitive Domain, David McKay, New York. 

Gagne, Robert, M. (1965): Conditions of Learning, Holt, Rinehart & 
Winston, New York. 

Madaus, G.E. (1973): In “Evaluation in Education International Pro- 
gress, International Review Series, Vol. 1, No. 2, 1977. 

Hannah, Larry, S. and Michaelis, John, U. (1977): A Comprehensive 
Framework for Instructional Objectives: A Guide to 5. ystematic Planning 
and Evaluation, Addison-Wesley Publishing Co., London. 

Singh, Pritam (1977): An Investigation into the Empirical Validity of 
Bloom's Taxonomy of Educational Objectives, Unpublished Thesis, 


Delhi University. A Dok 
00: 


of Readings, N.C.E.R.T., New Delhi. 
1 Objectives, 


Handbook-1I, Affective Domain, David McKay, New Delhi. 

Simpson, E.J. (1966): Classification of Educational Objectives: Psycho- 
motor Domain, University of Illinois. 

Alles, J. (1967); Analysis of Psychomotor Aspects of Behaviour, 
Theoretical Constructs in Curriculum Development and Evaluation, Sri 
Lanka. 


494 Handbook of Pupil Evaluation 


17. 
18. 


49. 


20. 


21. 


23. 


3. 


9. 


10. 


Dave, R.H. (1971): Report of International Conference of Educational 
Testing at Berlin 1967, University of London Press. 

Harrow, A.J. (1972): A Taxonomy af Psychomotor Domain, David 
McKay, New York. 

Hannah, Larry, S. and Michaelis, John, U. (1977): A Comprehensive 
Framework for Instructional Objectives: A Guide to Systematic Planning 
and Evaluation, Addison-Wesley Publishing Co., London. 

Maclay, R.W. (1969): Organisation of Practical Work and Teaching in 
Integrated Science, Post College Course for Teachers, Sidney Teachers 
College. 

Moore (1967): A Comprehensive Framework for Instructional Objectives: 
A Guide to Systematic Planning and Evaluation, Addison-Wesley 
Publishing Co., London. 


+ Eisner, Elliot, W. (1967): Educational Objectives: Help or Hindrance: 


Schoo] Review, 75, Autumn, pp. 250-60. 
Atkin, J.M. (1968): “Behavioural Objectives in Curriculum Design: A 


Cautionary Note,” The Science Teacher, 35. 


CHAPTER V 


Gronlund, N.E. (1986): Measurement and Evaluation in Teaching, Sth 
Ed., Collier and Macmillan, New York, pp. 10, 292-94, 


- Good, Carter, V. (Ed.) (1959): Dictionary of Education, McGraw Hill- 


Book Company, Inc., New York. 

Good, Carter, V. (Ed.) (1959): Dictionary of Education, McGraw Hill- 
Book Company, Inc., New York. 

Singh, Pritam (1983): Evaluating Students in Elementary Schools, 
Department of Measurement and Evaluation, N.C.E.R.T., New Delhi. 


- Good, Carter, V. (Ed.) (1959): Dictionary of Education, McGraw Hill- 


Book Company, Inc., New York, p. 34. 


- TenBrink, Terry, D. (1975): Evaluation—A Practical Guide for 


Teachers, McGraw Hill Book Company, Inc., New York, pp. 11, 12. 
Lindquist (Ed.) (1951): Educational Measurement, Washington D.C., 


American Council on Education, pp. 533-40. 4 
Gronlund, N.E. (1986): Measurement апа Evaluation їп Teaching, 5th 


Edition, Collier, Macmillan, New York, pp. 118-121. | TA 
TenBrink, Terry, D. (1974): Evaluation—A Practical Guide for Teachers, 


McGraw Hill Book Company, Inc., New York, Chapter 2. 1 
McCormick, Robert & James, Mary (1983): Curriculum Evaluation in 
Schools, Billing and Sons Ltd., Worcester, U.K. 


CHAPTER VI 


+ N.C E.R.T. (1979): Minimum Lcarrirg Continuum, Publicaticn No. 3, 


References - 


Primary Curriculum Development Cell, N.C.E.R.T., New Delhi. 

- Gronlund, N.E. (1970): Stating Behavioural Objectives 

ion, Macmillan, New York. | 
тата ed Н. and Anderson, Lorin, W. (1973); Mastery Learning 
in Classroom Instruction, Macmillan Publishing Co., Inc., New York. 

4. Singh, Pritam (1974): Objective Based Teaching and Testing, Sample 
Teaching-Learning Units in Biology, Department of Measurement and 
Evaluation, N.C.E.R.T. 

5. TenBrink, Terry, D. (1974): Evaluation—A Practical Guide for Teachers, 

ok Co., New York, Chapter I. 

6. Bloom, B.S. (1971): Mastery Learning and Its Implications for 
Curriculum Development 1n Confronting Curriculum Reform by E.W. 
Eisner, Life Brow and Company, Inc., New York, pp. 17-49. 

7. Singh, Pritam (1986): Diagnostic valuation and Remediation Depart- 
ment of Measurement & Evaluation, N.C.E.R.T. 


8. Block, James, H. & Anderson, L.W. (1975): Mastery Learning in 
Classroom Instructio 


п, Macmillan Publishing Co., Inc., New York, 
рр. 33-38, 
9. Traxler, A.E. (1959): Ten Essential Steps in a Testing Programme, 
Education, 79. pp. 1-6, 
Singh, Pritam (1983): 
Theory into Practice, 
NCERT., New Delhi 


for Classroom 


10. Evaluating Students 
Department of Me 
‚р. 69. 


in Elementary Schools, 
asurement and Evaluation, 


CHAPTER УП 


I 
о 
3 
ә 
a 
[r] 
Б 
2 
2 
A 
со 
& 
9 
5 


Techniques — А n I ntroduction, 


“= E.R.T. (1965-66): 

Comprehensive Inte 5 Rajasthan Board of 
n, Ajmer, 

Schwartz, ^. and Tiedman, Stuart 

Longmans’ Green and 

6. Singh, 


In Preparation and Evalu 


А ation of 
> Appendix C2, p. 107, N.C.E.R. T. 
. Ж John, ұу, апі Каһп, James, 
th E i 


reno, Ј 1. et al. 
Press, New York, 


ntal Measurement Year 
10. 


Book, 
by J.J. Eysenck, 
“A Practical Guide for Teachers 


(o C 


496 Handbook of Pupil Evaluation 


k 


омет RL 


McGraw Hill Company, New York, pp. 147-48. 


CHAPTER VIII-IX 


. Webster, Naoh (1981): New Webster’s Dictionary of the English 


Language, Delux Edition, p. 783, Delair Publishing Company, Inc., 


USA. 
Singh, Pritam (1972): Dynamics of a Question, Journal of Indian Educa- 


tion, March Issue, N.C.E.R.T. 


. Monoroe, Walter, S. and Carter, Ralph (1923): The Use of Different 


Types of Thought Questions in Secondary Schools, University of 
Mlinois, Research Bulletin, No. 14. 


. Weidemann (1941): Review of Essay Test Studies, Jourmal of Higher 


Education, 12: pp. 41-44. 
Stalnaker, John, M. (1951): The "Essay Type of Examination", In 


Educational. Measurement, by Lindquist, E.F. American Council of 


Education. pp. 495-530. 
Stalnaker, John, M. (1951): The “Eassy Type of Examination", In 


Educational Measurement, by Lindquist, E.F. American Council of 


Education, pp. 495-530. 
Harper, A.E. (1975): Researches on Examination, N.C.E.R.T., New 


Delhi, pp. 8-52. 
Singh, Pritam et al. (1977): Report of Workshop on Developing 


Question. 
Singh, Pritam etal. (1977). Report of Banks’ Examination Reform 


Unit, N.C.E.R.T., New Delhi. 


CHAPTER X 

Ebel, R.L. (1965): Measuring Educational Achievement, Preatice-Hall 
of India Pvt. Ltd., New Delhi, (Indian Print), p. 151. 

lbid., р. 160. 

Ibid., p. 185. 

Ibid., p. 184. 

Ibid , p. 186. 

Ibid., p. 179. 

Ibid., p. 181. 

Ibid., p. 174. 


CHAPTER XI 


Ebel, R.L. (1966): Measuring Educational Achievement, Prentice-Hall 


References 497 


of India Pvt. Ltd., New Delhi, pp. 144-45. 

. Ibid., pp. 124-27. 

- Ibid., рр. 130-31. 

4. Swineford Frances (1941): Aaalysis of a Personality Trait, Journal of 
Educational Psychology, X XXII, pp. 438-44. 

5. Chronbach, L.J. (1942): Journal of Educational Psychology, XXX, in 
Ebel, R.L. (1966) p. 135. 

6. Remmers, Н.Н. et al. (1967): А Practical Introduction to Measurement 
and Evaluation, (Indian Reprint) U.B.S., Delhi, pp. 253-55. 

7. Ibid., pp. 260-61. 

8. Hudson, B. (1973): Mathuen Educational Ltd., London, p. 92. 

9. Ibid., pp. 95-98. 

10. Ibid., p. 103. 

11. Head, J.J. (1967): Flexibility in Interpretation of an O. Level Mark 
Scheme, Educational Review, 19, pp. 18-128. 


CHAPTER XII 


1. Ebel, R.L. (1965): Prentice-Hall of India Pvt. Ltd. (Indian Reprint) 
New Delhi, Chapter 10. 
‚ Lindquist, E.F. (1955): Educational Measurement, American Council 
of Education, Washington D.C., p. 568. 
3. Gheiselli, E.E. (1953): Journal of Applied Psychology, 37 pp. 18-20, 
4. Guilford, J.P. (1956): Fundamental Statistics in Psychology and Educa- 
tion, McGraw Hill Book Co., Inc., London, p. 436. 
5. Nunnally, J.C. (1907): Pscyhometric Theory, McGraw Hill Book Co., 
New York. 
6. Cureton, E.E. (1956: In Lindquist, Educational Measurement, 
American Council of Education, Washington D.C., р. 622. 
p^ Richardson, M.W. (1939): The Calculation of Test Reliability Coeffi- 
cients Based on Method of Rational Equivalence, 1. Edn. Psycholo, 30 
PP. 681-87, 
8. Thorndike, R.L. and Hagen, E.P. (1969): Measurement and Evaluation 
in Education and Psychology, 3rd Edn., New York, p. 192, 
9. Guilford, J.P, (1957): Fundamental Statistics in Psychology and Educa- 
tion, McGraw Hill Book Co., Inc., London, p. 457. 


~ 


10, Lindquist, EF (1955): Educational Measurement, American Council 
si T Education, Washington D.C., p. 577. 

. епВгїпк, Terry D. (1974): Evaluation—A Practical Guide for Teachers, 
i McGraw Hill Book Co., New York, p. 461. 


Lindquist, E.F. (1942): First Course of Statistics (Rev. ed., Boston) 
Houghton Mifflin Company, p. 213. 
13. Cureton, E.F., Thorndike, R.L. and Hagen (1951): “Validity” In 
Educational Measurement, cd. E.F. Lindquist, Washington D.C., 
Merican Council of Education, p. 265. 


498 Handbook of Pupil Evaluation 


14. English and English (Dictionary). 
15. Thorndike, R.L. and Hagen (1955): Measurement and Evaluation in 
Education and Psychology, 3rd Ed. New Yoik, pp. 109-110. 


CHAPTER XIII 


1. Singh, Pritam (1984): Objective Based Teaching and Testing-Sample 
Teaching-Learning Units in Biology, N.C.E.R.T., New Delhi, pp. 9-11. 
. Singh, Pritam (1988): Unit Tests in Biology for Classes IX & X (Under 
Print) Department of Measurement and Evaluation, N.C.E.R.T., New 


Delhi. 
3. Anderson, L.W. (1975): Mastery Learning and Classroom Instruction, 


pp. 26-29, MacMillan Publishiag Co., Inc., New York. 

4. Menon, Kamala (1980): Sample Unit tests in Geography, Department 
of Measurement and Evaluation, N.C.E.R.T., New Delhi. 

5. Singh, Pritam (1986): Diagnostic Evaluation and Remediation for 
Effective Learning, Department of Measurement and Evaluation, 
N.C.E.R.T., pp. 30-36. 

6. Srivastava, H.S. and Singh, Pritam (1977): Use of Test Material in 
Teaching, N.C.E.R.T., New Delhi. 

7. Singh, Pritam (1973): Biology Part-I Science for Middle Schools, Test 
items, Department of Science Education, N.C.E.R.T., New Delhi. 


кю 


CHAPTER XIV-XV 


1. Bloom, B.S. et al. (1956): Taxonomy of Educational Objectives, Hand- 
book-I Cognitive Domain, David McKay, New York. | 

2. Singh, Pritam (1975): In Monograph on ‘Practical Examination in 
Science Subjects’ by Pritam Singh, N.C.E.R.T., New Delhi, рр. 20-23. 

3. Rajasthan Board of Secondary Education & N.C.E.R.T. (1967): Sample 
question papers in Biology. Rajasthan Board of Secondary Education. 

4. Singh, Pritam (1988): Sample question paper in Biology for classes IX- 
X (Under Print). 

5. Hudson, B. (1973): Assessment Techniques—An Introduction, 
Educational Ltd., London, p. 196. 

6. Ibid., p. 188. 


Methuen 


CHAPTER XVI 
1. Singh, Pritam (1966): Improving Teaching and Testing in Biology, State 
Evaluation Unit, Himachal Pradesh, Solan, Shimla Hills, pp. 68-77. 
- Saupe, J.L. (1961): Some Useful Estimates of the Kudar-Richardson 
Formula-20 Reliability ^ Coefficient-Educational and Psychological 
Measurement, 2, pp. 63-72. 


~ 


References 499 


8. 


4. 


t^ 


ко 


. TenBrink, Terry, D. (1974): Kudar- 


. Chronbach, 


- N.C.E.R.T. (1975): 


- Department of Curriculum & Evaluation, 


‚ Singh, Pritam (1975): Evaluatin, 


. Singh, Pritam (1977): Board 


- Singh, Pritam (1983): Crite 


- Singh, Pritam (1985): 


- Ministry of H.R.D., Govt. 


- Ministry of H.R.D., Сом. 


МсМоггіѕ (1972): Quoted from Monograph on Test and Item Analysis 
for Universities by V. Natrajan, A.LU., New Delhi (1977). 


Diederich, Paul (1960): Short cut Statistics for Teacher Made Tests, 
Princeton. 


. TenBrink, Terry, D. (1974): Spearman Brown Formula for Correcting 


Split-half Correction Coefficient. Quoted from, Evaluation — А Practical 
Guide for Teachers, McGraw Hill Book Co., New York, p. 457. 
Richardsun Formula-21 Quoted 
from, Evaluation —A Practical Guide for Teachers, McGraw Hill Book 
Co., New York, p. 457. 

L. (1951): Quoted from Monograph on Test and Item 
Analysis, by У. Natrajan A.LU. Delhi (1977), p. 183. 


. Nuttal (1969): Quoted from Monograph on Test and. Item Analysis by 


V. Natrajan, A.LU., New Delhi (1977), p. 183. 


CHAPTER XVII 


The Curriculum for the Теп-Усаг School—A 
Framework, N.C.E.R.T., New Delhi. 


N.C.E.R.T. (1968): Sample 


Unit Tests and Sample Question Papers in Biology, Board of Secon- 


dary Education, Assam, Guwahati. 


B Students at the Elei 


mentary Stage, 
Department of Measurement and Еу 


aluation, N.C.E.R.T., p. 69. 
of Secondary Education, Assam, 
Guwahati. 


. Singh, Pritam (1983): Illuminative Evaluation 


in Science, School 
Science, June Issse, N.C.E.R.T. 


rion-Referenced Testing 


—А Мопорга 
N.C.E.R.T. men. 


Research on Examinations — 
Prospect. In Contemporary Issues in Public Ex 
pp. 101-31. 


А Retrospect and 
aminations, NCERT, 


CHAPTER XVIII 


of India (1986): National Policy on 
tof India, Ministry of HRD. Depart- 
hi. 

i cf India (1986): National Policy on 
Education—1986, Programme of Action, Ministry of H.R.D, Depa 
ment of Education, New Delhi. | 


Едисаоп—1988, Governmen 
ment of Education, New Dell 


500 


10. 


11. 


12. 


13, 


14. 


15. 


16. 


Handbook of Pupil Evaluation 


CHAPTER XIX 


- Singh, Pritam (1974): (52 of Quizes in Radio Broadcast. Paper read 


in UNICEF Workshop organised by N.C.E.R.T. 


- Scannell, D., Tracy, D.B. (1975): Testing and Measurement in the 


Classroom, Houghton Mifflin Company, Boston, p. 39. 


» Singh, Pritam (1985): Diagnostic Evaluation and Remediation for 


Effective Learning, D.M.E.S. & D.P., N.C.E.R.T. 
N.C.E.R.T. (1986): National Talent Search Scheme, N.C.E.R.T., 
New Delhi. 


. Navodaya Vidyalaya Samiti, Ministry of H.R.D. (1987): Navodaya 


Vidyalaya Scheme, Ministry of Human Resource Development, 
Department of Education, New Delhi. 


. Krathwhol, D.R. et al. (1964): Taxonomy of Educational Objectives, 


Hand Book II, Affective Domain, David McKay, New York. 


. Hannah, Larry, S., Michaelis, John, (1977): A Comprehensive Frame- 


work of Instructional Objectives: A Gutde to Systematic Planning and 
Evaluation, Addison-Wesley Publishing Co., London. 


- Singh, Pritam (1983): A Monograph on Improving Practical Examina- 


tions in Science, N.C.E.R.T., New Delhi. 


· Srivastava, H.S., Singh, Pritam (1977): Use of Test Material in 


Teaching, N.C.E.R.T., New Delhi. 


- Singh, Pritam (1983): Illuminative Evaluation in Science, School 


Science, June Issue, N.C.E.R.T ., New Delhi. 

Singha, H.S. (1986): Management of Examinations—A Model. In 
Evaluation at the Elementary Stage —A Book of Readings, pp. 103-14, 
by Pritam Singh (Edit.) N.C.E.R.T.,New Delhi. 

N.C.E.R.T. (1985): National Curriculum for Primary and Secondary 
Education —A Framework — N.C.E.R.T., New Delhi. 

Department of Curriculum & Evaluation, N.C.E.R.T. (1967): Scheme 
of Comprehensive Internal Assessment. Board of Secondary Education, 
Rajasthan, Ajmer. 

S.C.E.R.T., Tamil Nadu (1976): State Council of Educational 
Research and Training, Tamil Nadu. 

Arora, P.N. (1986): Open Book E.aminations, D.M.E.S. & Р.Р. 
N.C.E.R.T., New Delhi. 
Srivastava, A.B.L. (1985): Scaling of Examination Results. In 
Contemporary Issues in Public Examinations, pp. 342-347, N.C.E.R.T., 
New Delhi. 

Singh, Pritam (1985): Indian Journal of Education, N.C.F.R.T., New 
Delhi. 


INDEX 


A.A.A.S Plan, 57 
Abbot Wood Report 1936-37, 12 
Ability tests, 425 
Accountability, 31 
Acharya Narendra Dev Committee 
1948, 12 

Achievement 

definition of, 147 
Achievement test 

aim of, 315 

standardised, 5 
Acquisition assignments, 203 
Adaptability 

in curriculum, 446-7 
Adaptive routinisation, 130 
Affective domain, 128-9 
Affective evaluation, 37 
Aims, 107-8 
All India Council for Secondary 

Education 1955 (AICSE), 14-15 
Alles, 130 


Alternative response matching, 249 
Analysis 
approaches for, 432-3 
criterion-referenced, 177 
illuminative approach of, 432.3 
norm-referenced, 176-7 
Psychometric approach of, 432 
Purposes of, 430 
self-referenced, 176 
see also Item analysis 
Anecdotal record, 196 
Appraisal, 35 


of activity Programme, 6 
definition of, 147 


Assessment, 35 
method 
measures proposed for, 467-9 
modes of, 145 
Assignment, 171 
Associational clue, 274 
Assumptions, 153-4 
Attitude scale, 202 
Averages 


measures of, 70-74 


Behaviour Communicativeness, 115 
Bhopal Seminar 1956, 15,17 
Biology 
objectives of, 383-4, 397-400 
question papers in 
Sample, 401-2, 405-13 
Bloom's taxonomy, 7, 124-5, 253.5, 383 
Blue-print, 357, 370-71 
aim of, 365 
preparation of 
assumptions for, 357-8 
methodology, 358.63 
for question paper, 389 
questions based on 
framing of, 389-90, 403.4 
of unit tests 
Sample, 372-3 
using of, 363-5 
Boston examination project, 3 


C.P. and B 


erar Committee, 12-13 


entra] Advisory Board of 
Education 1944, 12 


502 


Central Services Examinations, 60 
Check list, 195, 299-330 
Chronbach, 295, 340 
co-efficient alpha, 436 
Closed objectives, 118 
Cognitive evaluation, 37 
Column matching 
varieties of, 297-8 
Common core test, 425 
Compositeness 
in curriculum, 446 
Compound matching, 298 
Concurrent validity, 341 
Conferences of Chairman and 
Secretaries of Boards of 
Secondary Education, 16-18 
Constant alternatives 
construction of 
guidelines for, 286-9 
format of, 282-3 
forms of, 283-6 
see also True-false items 
Construct validity, 341-2, 344 
implications of, 349-50 
Content 
objective, 115 
validity, 342 
Correctives, 182-3 
Correlation 
computing methods of, 87-92 
concept, 83-93 
tetrachloric, 87, 88t 
Councellor’s record, 172 
Covert behaviour, 118-9 
Criterion-referenced 
judgements, 178 
measurement, 46-56, 460, 476 
Critical-incidence-technique, 109 
Curricular validity, 342, 344 
Curriculum, 152-3, 162 
approach paper on, 444 
assumptions, 445-6 
key concepts in, 446-8 
definition of, 152 
development, 486 
evaluation, 61-2 


Index 


Curzon’s resolution 1904, 10 


Data, 67, 449 
techniques for, 187-204.'205t, 
206t 
organisation 
methods, 68-70 
Decision-making, 179-80 
Derived scores 
types of, 79-82 
Derived validity,[343 
Descriptive evaluation, 481 
Deviation score, 74-5 
Diagnosis, 182 
Diagnostic 
evaluation, 42-4, 149, 151t, 472 
studies, 109-10 
testing, 458, 476-7 
Diagram 
abilities tested, 308-11 
classification of, 307 
varieties of, 307-8 
Direct validity, 343 
Disciplines, 111 
Discrimination index 
calculating formulas for, 439-41 
Dispersion 
measures of, 74-9 
Distractors, 250 


Education Commission 1964-66, 
20-21 
Educational evaluation 
cognitive development and, 457-8 
ecology of, 161-2 
evidence-gathering,"458-9 
formative function of,'457 
generalisations of, 450-52 
objective-based, 456 
recommendations for, 464-73 
scope of, 457-61 
trends in, 452 
bases of, 452-4 
principles for, 455 
Educational objectives 
classification of, 122-4 


+ 


Index 


Empirical validity, 342 
implications of, 347-9 
Enabling objective, 117 
Entrance examination 
development of, 9 
Error, 154 
copping up of 
principles for, 154-7 
Essay-tests, 220, 236-7, 324 
classification of, 221 
extended response type, 221 
facility value of, 437-8 
grading, 226 
methods of, 227.8 
improvement of, 223-6 
restricted response type, 220-2] 
Scoring objectivity 
improvement suggested for, 
229-32 
scoring reliability 
improvement, 228-9 
Varieties of, 220-2 
"stimate 
Standard error of, 102 
valuation, 33-4 
aspects of, 8, 149-50, 151, 448-50 
aS continuous Process, 158.9 
Cooperative Process, 159 
tools and techniques for, 
187-204, 205t, 206t 
diagnosis and correctives in, 182 
às dyanamic Process, 160 
ecology of, 161-2 
Emerging demands on, 
learner based, 457-87 
teacher based, 457-87 
administrator based, 457-87 
integrative model of, 163 
interpretation of, 177-80 
measurement and, 34 
Telations with, 148-9 
Models of 
category Wise, 53-6 
nature and Scope of, 146-9 
objective-centred, 158 
Principles, 155-7 


503 


process, 164-5 
characteristics of, 157-61 
examination refcrm and, 464-7 
purpose, 165-8 
steps involved, 165-86 

reporting, 181-2 

summarisation, 180-81 

teachers and, 480-8] 

Evidence collection, 170-71 

sub-stages for, 171-4 

Evidence processing, 
methodology of basis, 459 
Examinations, 
developments of, 3-29 
origin of, 1-3 
Exo-internal evaluation,§38 
Experiential domain, 133-4 
Experimental evaluation, 51-2 
External evaluation, 38-9 


Face validity, 342 
Facilitating objective, 117-8 
Facility index 

calculating formulas for, 437-9 
Factorial validity, 342 
Feedback, 183-5, 447-8,[460 
Formal evaluation, 39-40 
Format clue, 274 


Formative evaluation, 42-3, 149-50, 
151t, 160 


Frequency distribution (FD), 69-79, 
Sagne-Merrill Taxonomy, 125 
Goal-free evaluation, 54-5 
Goals 
Objectives and, 107-8 
Statements, 107 
Grading, 485 
analytico-synthetic method for, 
229-30 
Grammatical clue, 274-5 
Graphic stimuli, 307-11 
Guess-who technique, 201 


Harper's formulas, 439-49 
Hartog Committee 1929, 11 


504 


Heterogeneous responses, 277 
Holistic method, 227-8 
‘Homogeneous responses, 277 
Hudson, 316 

‘Hunter Commission 1882, 9 


Iluminative evaluation, 52-3, 481 
Indian University Commission 1902, 
10 
Informal evaluation, 40-1 
Information gathering 
analysis of, 174 
modes, 175-7 
forms of, 174-5 
recording of, 175 
sources of, 172-4 
see also Data gathering 
Inquiry 
techniques of, 198-203 
Instructional objectives, 166 
formulation of, 167 
Integrating evaluation, 473 
reforms proposed for, 469-71, 
473 
Tntegration 
in curriculum, 447 
Intellect 
faces of, 123-4 
Intellectual processes, 127 
Internal assessment, 23, 483-4 
Internal evaluation, 37-8 
Interpretation, 177-8 
Interval scale, 66 
Interview, 200 
Intrinsic validity, 342 
Inventory, 199-200 
Item analysis, 437-42 
use of, 442-3 


Job analysis, 109 
Johnson Upper Lower Index, 439-40 
Judgements, 149 
formation of, 177-8 
bases, 178-9 
importance of, 180 
Purpose cf, 178-9 


Index 


Key, 250 
Knowledge objective, 384 
Kothari Commission, 112 
Kuder-Richardson formulas, 97-8, 
433, 435-6 
simplified, 98 


Language, 268-9 
Learnability, 447 
Logical evaluation, 50-51 


Maclay's taxonomy, 131-2 
Macro-evaluation, 59-60 
Marking, 485-6 
conformity check in, 423-4 
expert judgement in, 420-22 
grading in, 423-5 
leniency vs. severity, 422-3 
moderation in 
measures for, 420-4 
methods, 424-7 
by standardisation, 425-6 
by statistical methods, 424-5 
steps for, 421-2 
scheme, 319-20 
preparation of, 390-91 
scoring key and, 378-9 
Mastery test, 134-5 
combined key, 301 
single key, 300-301 
Matching type items 
common errors in, 302 
constructing of 
steps in, 302 
suggestions for, 302-3 
design, 296-7 
usability of, 301-302 
varieties of, 297-301 
McMorris formula, 434 
Measurement, 32-3 
assumptions of, 153-4 
definition of, 147-8 
errors of, 95-6, 151 
causes, 96 
evaluation and, 34, 456 
relations with, 148-9 
levels of, 64-7 


Index 


of relationships, 83-93 
standard error of, 99-102, 330, 
339 
estimating formula, 100 
Lord's method of calculating, 
100-101 
statistical concepts in 
data, 67-8 
data organisation, 68-70 
derived scores, 79-82 
Statistical measures, 70-79 
Measuring instruments, 
reliability, 326-40 
usability, 324-G 
validity, 340-51 
see also Tools 
Mean, 70-71, 75 
Median, 72-3 
Mega evaluation, 59 
Micro-evaluation, 58-60 
Mode, 73-4 
Moderating test, 425 
Moderation, 417-8 
in marking, 
measures for, 420-4 
methods, 424-7 
meaning of, 417 
Post-examination, 420-27 
purpose of, 417-8 
of question papers 
dos' and donts', 419-20 
tasks implied in, 418-9 
Morrison index, 438.9 
Multiple choice type, 246 
base of, 248 
construction of 
Suggestions for, 267-81 
Criticism of, 247 
format of, 248-51 
matching items, 296-303 
Ordinary, 248, 250 
potentialities of, 251-7 
rearrangement type, 304-6 
varieties of, 257-67 


Mysore Conference 1965, 20 


505 


Nag Chaudhari Committee, 19 
National Council of Education 
Research & Training (NCERT), 
18-19, 383 
approach papers of, 23-5 
taxonomy of, 
objectives, 128 
Navodaya Vidyalayas, 477-8 
Negative stem, 271 
New curriculum see Curriculum, 
apppoach paper 
New Delhi conference 1957, 16-17 
New Education Policy 
emphases of, 472-3 
Nominal scale, 65-6 
Non-formal evaluation, 40 
Non-parametric data, 67 
Norm referenced measurement, 44-5 
Normal probability curve, 93-5 
Nuttal formulas, 436, 439 


Objective-based 
evaluation, 53-4 
testing, 383-4 

Objective-type tests 
facility value 

calculation formula, 437 
matching type items 
multiple choice items 
Pictorial varieties 
Rearrangement item 
Structured questions 


see also constant alternatives 
Objectives 


classification of, 108 
criterion-referenced, 134-6 
desirability of 

approaches, 109-10 
dimensions of, 166-7 

ability definition, 167 
educational 

classification of, 122-4 
elements of, 115-7 
educational, 142 
formulation of, 167-8 
goals and, 107-8 


identification and stating of 


506 


levels for, 112-5 
item bank and, 141-2 
national level, 112 
nature of, 105-9 
selection criteria for, 111-2 
sources and derivation of 
empirical approach, 109-12 
statement of, 
common errors in, 119-21 
taxonomies, 124-34 
Observation 
instruments of, 194-8 
techniques of, 194-8 
Open-book examination, 483-5 
Open objectives, 118 
Oral examinations, 35-6, 189-90 
advantages of, 190-91 
classification of, 190 
history of, 1-2 
purpose of, 190 
validity and reliability of 
improvement, 191-3 
Ordinal scale, 66 
Overt behaviour, 118-9 


Paper Setting 
objectives in 
predominance of, 391.6 
see also Question papers 
Parallel test, 425 
Parametric data, 67 
Pay-off evaluation, 55-6 
Percentile. 
rank, 81-2 
Score, 82 
Perception domain, 132-3 
Performance 
assessment methods of 
measures proposed for, 467-9 
Personality tests, 6 
Physical tests, 2 
Pictorial questions, 307-11 
Placement devices, 202 
Point score method, 229 
Practical test, 36 
Predictive validity, 343-4 
limitations of, 347-9 


Index 


Process evaluation, 57-8 
Product evaluation, 56-7 
Product-cum-process evaluation, 
57-8 

Programme of Action (POA), 467-72 
Programme evaluation, 61, 486 
Projective devices, 202-3 
Projects, 170-71 
Psychometric approach, 481 
Psychomotor 

assessment, 37 

domain, 479-80 
Push down principle, 125 


Qualitative evaluation, 48-9 
Quantitative ev ‘Juation, 47 8 
Quartile deviation, 76-7 
Question bank, 482-3 
Question papers 
blue print of, 389 
designing of, 384-92 
editing and consolidation, 390 
framing of questions 389-90, 
403-4 
moderation of, 419-20 
Question-wise analysis, 391, 401-2, 
414-6 
Questionnaire, 198-200 
Questions, 
characteristics of, 213-4 
classification of, 218-20 
dynamics of, 214-5 
essay-type, 220-28 
facets of, 209-11 
forms of, 387-8 
genesis of, 211-3 
long-answer, 217-32 
objective based, 217-8 
paradigm of, 211, 212f 
short-answer, 232-42 
Quizes, 475 


Radhakrishan Commission 1948-49, 
13 

Rankings, 69, 197 

Rao Committee on Examinations 
(1971), 21-3 


Index 


Rating 
method, 227-8 
Scales, 195-6 
Ratio scales, 66-7 
Rational evaluation, 50-51 
Readiness testing, 168-9 
Rearrangement items 304-6 
Relationships, 
measures of, 83-93 
Reliability, 95-9, 326-40 
coefficient, 330-2 
computing methods, 97-9 
definitions of, 327-9 
estimation of 
methods for, 336-8 
evaluation of, 
data needed for, 339 
Precautions to control, 228-9 
of tests 
formulas for, 433-6 
validity and, 326, 329 
Remediation, 168-9, 476-7 
Reporting, 181-2 
Research 
evaluation, 486-7 
Resolution of 1913, 10-11 
Responses, 273-81 
clues and, 274-6 
length clue, 275-6 
Responsive evaluation, 49-50 
Review assignments, 204 
Rice, J.M,, 3-4 


Saddler Commission 1917-19, 11 
Sampling validity, 342 
Sapru Committee 1934, 12 
Sargent Report 1944, 12 
Saupe formula, 98-9 
Scales 
types of, 65-7 
Scaling, 485 


Score Cards, 197-8 
Scores 


Confidence trait and, 293, 295 
Scoring key 


Marking scheme and, 378-9 


Preparation of, 390-91 
Scores 
interpretability of, 325 
true and error, 331-2 
Secondary Education Commission, 
1952-53, 13-4 
Self evaluation quizes, 475 
Self-referenced 
judgements, 178 
measurement, 46-7 
Sentence completion clue, 276 
Short-answer questions, 232-37 
concept of, 232 3 
see also Very-short-answer 
questions 
Simpson's taxonomy, 129 
Social distance scale, 201-2 
Sociometry, 200-203 
Spearman's-Brown 
Correction formula, 435 
prophesy formula, 333-4, 337 
Specification 
mearing of, 121 
Split-half method, 97, 99, 433, 435-6 
Stem, 248 
criteria for, 270-73 
Standard deviation, 77-9 
Standard error of measurement 
(SEM), 99-102, 330, 339 
Standard scores, 79-80 
Statistical analysis 
descriptive, 67 
inferential, 68 
measures, 70-79 
Statistics, 64 
Statistical validity, 342.3 
Structured questions, 311-20 
characteristics of, 312-5 
Subject matter 
units of, 353 
Summarisation, 180-81 


Summative evaluation, 42-3, 150, 
151t, 169 


Talent search 
examination, 60 


508 


Scheme, 477-8 
Taxonomies, 124-34, 383, 479 
cognative domain, 124-8 
Psychomotor domain, 129-32 
Teacher assessments, 472 
Teacher's records, 172, 174 
Teacher's training, 17 
Teaching and testing 


objectives’ place in, 136-7, 138-9t, 


140-41 
Techniques, 187-8 
inquiry, 193-203 
observational, 194-8 
testing, 188-93 
TenBrink, 33, 94, 171-2, 203 
Terminal Objectives, 117 
Test 
definition of, 147 
Test results, 429-31 
Testing 
devising situation for, 170-71 
techniques of, 188-93, 204-205, 
206t 
Test-retest reliability, 97 
Tests, 34-5, 429-30 
formulas, 433-6 
norm-referenced, 45 
reliability, 326-40 
requirements for good, 324-51 
usability of, 324-6 
validity, 340-51 
see also, Examinations; unit tests; 
written tests 
Tools, 188, 206 
for analysis, 203-4 
for enquiry, 198-203 
for observation, 194-8 
sociometric, 201-3 
Transfer exercises, 204 
True-false items, 283-296 


Index 


see also constant alternatives 


Unit concept, 353-4 
Unit tests, 367-6 
blue-print of, 357-65 
concept of, 354 
duration of, 355 
types of, 354-7 
useful purposes of, 366-9 
UP Committee on Reorganisation 
of Secondary Education 1953, 14 
University Education Commission 
1948-49, 13 
Usability, 324-6 


Validity, 340-51 
definitions of, 340-41 
implications of, 345-50 
limitations 
in use, 350-51 
reliability and, 326, 329. 
types of, 341-3 
categories of, 343.5 
Values, 105-6 
Variance, 75-6, 96-7 
sources of, 327-8 
Verbal clue, 275 
Very-short answer questions 
construction of 
suggestions for, 239.42 
types of, 238.9 
Vocationalisation 
approach paper on, 24-5 


Wolfes adaptation methods, 90.92 
Wood's despatch of 1854, 9 
Written tests, 35-6, 145, 188-9, 382 

advantages of, 3 

origin of, 2-3 

see also Examinations 


ЕККАТА 


Page Line 
(vi) 10 from bottom Read school in place of cool 
(vii) 19 from top Read emphases in place of emphasis 
(xii) 03 from top Delete the word ‘which’ 
(xiii) 09 from bottom Delete the word ‘besides’ 
(xv) 04 from top Read achievement in place of achievements 
(31) 13 from top Read a far cry in place of of far cry 
(90) 13 from top Read the formula without minus sign before 6 
(95) 03 from top Read the formula as y= Pa e у 
(190) 03 from top Read Content Subjects in place of Continent 
Subjects 
(210) Last line below Read hours in place of tours 
the graph 
(385) 13 from bottom (ii) Subheading: Read Total marks in place of 
Total time 
(435) Last line (v) K.R. Formula-20: Read in the formula Xpq 
in place of pq 
No. of students 
attempting the 
items correctl 
(437) 03 from top Read F. V, ofanitem = — Е ey X 100 
Total number 
attempting 
the items 
(465) 19 from top Read article (8.24) in place of (8.21) 
(467) 11 from bottom Read Programme of Action in place of actions 
(468) 17 from bottom Read Deemed University in place of Demand 
University 
(475) 11 from bottom Read evaluatorin 


Please Tead Еу i i 
J valuation in National Polic 
Tunning headings on top o у оп E 


ducati, 
f pages 465, 467 део any . оп (1986) 


TOP. а oe 


